All of my content is listed on this page.
PostgreSQL 17 claims many performance improvements. In this article, I benchmark it against PostgreSQL 16.
PostgreSQL is significantly faster than MySQL in my most recent performance benchmarks using Reserva, a custom database benchmarking tool.
In this article, I conduct original research into the performance characteristics of MariaDB and PostgreSQL.
MariaDB is significantly faster than MySQL, according to my custom benchmarking tool. I would prefer using MariaDB, if I had that choice.
PostgreSQL and MariaDB are the fastest mainstream open-source databases. MySQL has some catching up to do.
GCP AlloyDB is 12% to 13% faster than AWS Aurora at a similar price point. Aurora has some nice durability features, though.
An explanation of the most important database type and my opinion on which ones are worth your time and money.
An explanation of what NewSQL databases are, when you might need one, and my opinion on which ones are worth your time and money.
An explanation of what a data warehouse is and opinions on which ones are worth your time and money.
An explanation of what a data lakehouse is and my opinion on which ones are worth your time and money.
Yugabyte is an open-source, distributed, PostgreSQL compatible database. It is an excellent option for when you have outgrown PostgreSQL (or some other PostgreSQL compatible database) and need exceptionally high throughput or fault tolerance.
Presto and its popular fork Trino are versatile tools for creating a data lakehouse. Their primary use case is running analytical queries on large amounts of data stored in other systems.
Azure’s data warehouse used to be called Azure SQL Data Warehouse, but Microsoft rolled that product into a larger collection of products called Azure Synapse Analytics.
SQL Server is an excellent relational database from Microsoft. I default to using PostgreSQL because it is also very good… and free! But if money was no object, I’d default to using SQL Server for most use cases.
Spark is a popular and versatile tool for data movement and processing. It even does a pretty good job acting as a data warehouse using a pattern called a data lakehouse.
Spanner is a distributed, PostgreSQL compatible database from Google. It might be the most technically impressive database out there. If you have outgrown PostgreSQL (or some other PostgreSQL compatible database) and are already on GCP, adopting Spanner should be a no brainer.
Snowflake is my personal favorite data warehouse. They were one of the first to offer a scalable service that separates storage from compute. This architecture is outlined in a whitepaper that I recommend every newbie data engineer reads.
Redshift is a data warehouse from Amazon. Its a fork of PostgreSQL that writes data in a column oriented format, making it much faster at analytical queries.
PostgreSQL is my default choice whenever I need a relational database. In fact, it is one of my favorite pieces of technology, ever.
MySQL is a battle tested database that will be forever tied to the most popular blogging tool of all time - Wordpress.
MariaDB is a fork of MySQL. In my opinion, MariaDB is a better piece of technology. However, MySQL has a larger user base and better managed service offerings.
CockroachDB is an open-source, distributed, PostgreSQL compatible database. It is an excellent option for when you have outgrown PostgreSQL (or some other PostgreSQL compatible database) and need exceptionally high throughput or fault tolerance.
Citus is an open-source PostgreSQL extension that grants it the ability to shard and distribute data to multiple nodes, and also to store data in a columnar fashion. It was recently purchased by Microsoft, but remains open-source.
Bigquery is a data warehouse from Google that allows users to store and perform analytics on structured and semi structured data. In some situations, I prefer it over my overall #1 data warehouse, Snowflake.
Aurora is a transactional database from Amazon. Its research white paper details the innovative, cloud-native storage engine that distinguishes it from its peers. The thing I find exceptional about it is how durable data is once written to the database’s storage layer - its written 6 times!
AlloyDB is a cloud native, PostgreSQL compatible database from Google. I recommend using it if you need a non distributed relational database and are already a GCP user.
Snowflake and Databricks are two well known providers of analytical data tools. Because of this, they’re frequently compared to each other. Fortunately, the choice between their product offerings is usually straightforward once you’ve defined your use case. In fact, their products complement each other well and shouldn’t really be considered competitors.
A description of an Intel mini-pc testing environment I frequently use to run Reserva database benchmarks.
Azure has several managed database services that are subtly different. This article explains how they are different, and when to use each one.
Snowflake and Databricks are two well known providers of analytical data tools. Because of this, they’re frequently compared to each other. Fortunately, the choice between their product offerings is usually straightforward once you’ve defined your use case. In fact, their products complement each other well and shouldn’t really be considered competitors.
Columnar storage is optimal for analytical queries, while row storage is optimal for transactional queries. This article explores the difference.