Splitgraph has been acquired by EDB! Read the blog post.

PostgreSQL

PostgreSQL is an advanced open-source transactional database and is probably one of the first projects one thinks of when they hear the word "database".

However, PostgreSQL is different from Seafowl in terms of the use cases it's designed for.

Comparison

Syntax

Seafowl uses Apache DataFusion as its query engine and its SELECT syntax is a subset of the PostgreSQL syntax. See the syntax reference for more information on the deviations.

Notably, Seafowl doesn't support:

  • the JSON data type and JSON operators
  • certain window expressions

Analytical queries

For analytical queries (queries that have to do a lot of aggregations and summarizing), Seafowl vastly outperforms PostgreSQL. We've seen 3x-10x performance improvements with Seafowl against PostgreSQL for these kinds of queries (see more on our benchmarking page).

This is due to two factors:

  • Seafowl stores data in the Parquet format, which is columnar and compresses much better than PostgreSQL. The subset of the Taxi dataset used in our benchmarks is 11GB in PostgreSQL 14 (without indexes) and 1.3GB in Seafowl. This means that Seafowl is going to be faster at reading data from disk, since there's less data to read.
  • DataFusion is a columnar query execution engine. It's optimized for scanning through lots of data in large batches, which means that the per-row overhead of performing a scan is much smaller than in PostgreSQL.

Transactional queries

On the other hand, Seafowl is extremely poorly suited to transaction processing use cases. It's missing all features that are vital for this:

  • primary, foreign key and other constraints
  • indexes
  • transactions, ACID compliance
  • roles
  • triggers and stored procedures (though you can write a Seafowl function in WASM)

Using parquet_fdw

If you need to use PostgreSQL, a cheap way to accelerate analytical queries is using parquet_fdw. This is a PostgreSQL extension that lets it query Parquet files. More information in this article.

While this lets PostgreSQL use the faster (for analytical queries) columnar Parquet format, this doesn't change the fact that PostgreSQL has a row-oriented query execution engine. In our benchmarks, PostgreSQL with parquet_fdw outperforms plain PostgreSQL on some queries, but can still be up to 5x slower than Seafowl.

Deployment patterns

Seafowl is HTTP-first and is designed to be accessible directly from the Internet, letting your application run SQL queries on it straight from the client. This contradicts classical Web application architectures, where the database is only accessed by a trusted backend component.

In addition, Seafowl's query API allows you to use various HTTP caches (CDNs, dedicated caches, the browser cache) to deliver query results to your end users. This is different from PostgreSQL where you'd have to implement this in your backend application, potentially relying on a non-HTTP cache like Redis or Memcached.

When to use PostgreSQL over Seafowl

  • You are building a "classic" application that relies on transactional features (a digital store front, a forum, etc) and needs to perform a large volume of single-row reads and writes with transactional consistency.
  • You need some PostgreSQL-specific query features:
    • Unstructured JSON data type support
    • Geographical data support with PostGIS

When to use Seafowl over PostgreSQL

  • You are building a Web application that needs to run analytical queries (aggregations, reporting). For example, this could be a data visualization or an interactive dashboard
  • You won't be writing to the database often, or only writing data in large batches
  • You want to avoid deploying and paying for both a database and a backend
  • You want to be able to cache query results at the edge