Trino and Presto Review

Last updated: October 3 2024

Presto and its popular fork Trino are versatile tools for creating a data lakehouse. Their primary use case is running analytical queries on large amounts of data stored in other systems.

Trino fork

Presto was originally created as an open-source project at Facebook. Some of the original creators of the tool struck out on their own and created a rebranded fork called Trino. They also started a company called Starburst.

While the original Presto repository is still actively developed, the Trino repository is even more active.

Because of this, from here on out, I’ll be referring to the product as Trino, but just know that most of the things I say will also apply to Presto.

Federated queries

Trino itself does not implement a storage engine. Instead, it leverages connectors to query other systems, move that data into its own memory / disk space, then perform fast analytics on that data. This pattern is called “query federation.”

The most commonly used and most feature-rich connector is Hive. That particular connector allows Presto to read and write columnar data to and from object storage, such as S3.

However, there are many other connectors that enable some interesting use cases. For example, there are Google Sheet, PostgreSQL, and Snowflake connectors. In theory, this means you could combine data from spreadsheets (such as from your finance or sales departments) with live data from a transactional database (like PostgreSQL) before processing, cleaning, and making sense of that data in a data warehouse (like Snowflake).

Query federation is a unique capability and enables some powerful use cases.

Pros

Cons