Should You Use Snowflake Or Databricks?

Last updated: September 29 2024

Snowflake and Databricks are two well known providers of analytical data tools. Because of this, they’re frequently compared to each other. Fortunately, the choice between their product offerings is usually straightforward once you’ve defined your use case. In fact, their products complement each other well and shouldn’t really be considered competitors.

If you want to run SQL queries on structured data, such as relational data or well-formed JSON and XML, you should use Snowflake. If you need to process large amounts of unstructured or semi-structured data, you should use Databricks.

In my mental model, Databricks is “upstream” from Snowflake. You might use Databricks to experiment with, then process data to eventually put it in a Snowflake data warehouse.

Snowflake

Other than its lack of a built-in ELT tool, Snowflake is an ideal data warehouse. In my opinion, if you want to run analytical queries to create business intelligence visualizations, or even serve some operational workload, you should strongly consider running them in Snowflake.

It offers many of the features you might expect from a high quality transactional database, such as PostgreSQL or SQL Server:

In addition to those features, Snowflake exhibits great performance, and the web UI has a great dashboard creation tool.

Because Snowflake was built from the ground up to be a database, all of these features are built in without any configuration required. Engineers, analysts, or anyone else who has worked with databases before should immediately understand the structure of the system and be able to get on with their work.

In summary, Snowflake was designed to be a data warehouse from the get-go, and it excels at that role.

Databricks

Databricks is built on top of Spark, an open-source data processing and data lakehouse technology. Databricks employs many people with deep Spark expertise. Because of this, they know how to get the most out of the program.

I consider myself fortunate to have taken a class from Matei Zaharia, the current CTO of Databricks and one of the original authors of Spark. As part of an assignment, we had to alter the open-source Spark codebase.

Working with that codebase revealed the massive amount of functionality baked into that program. It can:

This huge amount of functionality makes Spark one of the most useful and widely used programs in the data engineering world. You might be reminded of the saying “Jack of all trades, master of none”, however.

The fact that Spark can run SQL queries on large amounts of data is what causes people to compare Snowflake and Databricks. However, creating a data warehouse wasn’t the original purpose of Spark, and to this day still isn’t it’s main use case.

Conclusion

Use the right tool for the job.

If you want to process large amounts of data (for example, to create an AI dataset), experiment on that data, or use one of the novel features that is only available in Spark, you will be well served by Databricks.

If you want to create a clean data warehouse to do analytics, master data management (MDM), or some other more traditional business function, use Snowflake.

Both services are offered with pay-as-you go pricing, so consider using both!