Columnar VS Row Storage In Databases

Last updated: September 28 2024

Row VS column storage visualization

Whenever you hear a table or database called “row oriented”, “columnar”, or basically any terminology suggesting that it is written via rows or columns, they are specifying the way that the database stores data.

Thinking about a spinning hard drive, or an array holding an SSD’s memory addresses - both of these things are one dimensional. In this way, we must eventually write all data in one dimension. If we write it in a row oriented fashion, we write it one entire row at a time before moving on to the next row. If we write it columnwise, we write one entire column of data before moving on to the next column.

Row oriented storage

Let’s write the data in the image from the top of the article in a row-wise fashion. We simply write the data, one row at a time:

1, 2024-07-01, 5, 60, 2, 2024-07-01, 3, 80, 3, 2024-07-02, 4, 1000, 4, 2024-07-03, 5, 65

This type of storage is optimal for transactional data, such as making a purchase. Each row represents a purchase, so whenever one is made, we can just append the purchase’s data to the end.

However, what if you wanted to answer a question like “what is our top selling product ID?” The computer would have to read data from the storage medium that are very far apart. Enter columnar storage.

Column oriented storage

Now let’s try writing that same data in a columnar fashion:

1, 2, 3, 4, 5, 2024-07-1, 2024-07-1, 2024-07-2, 2024-07-3, 5, 3, 4, 5, 60, 80, 1000, 65

This type of storage is optimal for answer a question like “what is our top selling product ID?”

As you can see, the product IDs are all stored together: 5, 3, 4, 5. This means the computer does not have to search as far or load as much data into main memory to answer your query.

However, what if you wanted to record a single purchase to this table? You would have to insert data into this structure in 4 different places. You would have to append the ID directly after the other IDs, the timestamp directly after the other timestamps, etc… Because of this, columnar databases are not well suited for reading for recording small amounts of transactional data quickly.

Conclusion

Row oriented and column oriented data storage systems complement each other. Frequently, organizations will use row oriented systems (such as a transactional relational database) to record critical information about their operations. Then, later on, they will move that data to a column oriented system (such as a data warehouse) to perform analytics on that data and gain insight into their operations.