Splitgraph has been acquired by EDB! Read the blog post.

Time travel queries

Table versioning

Modification of existing data is a core feature of Seafowl, and it effectively reduces to adding new and/or potentially replacing old partitions, followed by creating a new table version. Each table version is nothing more than an entry in our metadata store, linking it to the underlying partitions.

Any partition not touched by a write query ends up being a shared component between the latest and previous version(s). We try to optimise for partition re-use across different table versions: for INSERTs we simply create new partitions from the provided data; in the case of UPDATE/DELETE only the partitions targeted by the statement qualifier (i.e. the WHERE clause, if any) are replaced by new ones.

Querying older table versions

By default, Seafowl interprets a plain table reference in a SQL statement as pointing to the latest version for that table. However, given that all (meta)data for previous table versions is being tracked, there is also an option of referencing older table versions. In other words, one can query historical data (AKA table time travel), and do so by using the table function syntax with a single argument

SELECT * FROM my_table('2022-10-25T09:30:55+00:00')

The provided timestamp tells Seafowl to plan and execute the query using the table version that was latest at the specified point in time. The time formats accepted include RFC 3339 (as above), RFC 2822 and a frequently used YY-MM-DD HH:MM:SS Z format.

Table version bookkeeping

There are no limitations on the table time-travel queries—one can mix and match multiple table references with no version specifiers, or with ones using any of the supported timestamp formats in a single SELECT statement—apart from one. Namely, Seafowl does not allow referencing table versions prior to the oldest recorded one, and will instead error out with the message No recorded table versions for the provided timestamp.

For this reason, but also as a more general tool aimed as a catalog and aide in version selection, the system schema provides the table_versions table, that can be queried to determine the exact times when each new version was created, starting with the oldest one.

Cleaning history

Finally, if one ends up with too many table versions and/or deems them obsolete, they can easily be cleaned up using the VACUUM command. In particular, VACUUM TABLE my_table will purge all table versions except the last one, and delete any resulting orphan partitions.

Example

Assume we have just created a table

CREATE TABLE my_table (number INT, parity VARCHAR)

and we go on to add some data

INSERT INTO my_table VALUES (1, 'odd'), (2, 'even'), (3, 'odd'), (4, 'even')

This will result in the creation of two table versions, the initial (empty) one, and the one resulting from the insert

SELECT * FROM system.table_versions;
 table_schema | table_name | table_version_id |    creation_time
--------------+------------+------------------+---------------------
 public       | my_table   |                1 | 2022-10-25 11:21:16
 public       | my_table   |                2 | 2022-10-25 11:21:36
(2 rows)

Running another DML statement

DELETE FROM my_table WHERE parity = 'odd'

will result in yet another table version being created

SELECT * FROM system.table_versions;
 table_schema | table_name | table_version_id |    creation_time
--------------+------------+------------------+---------------------
 public       | my_table   |                1 | 2022-10-25 11:21:16
 public       | my_table   |                2 | 2022-10-25 11:21:36
 public       | my_table   |                3 | 2022-10-25 11:22:48
(3 rows)

Querying the latest table version (3) gives us the expected results

SELECT * FROM my_table;
 number | parity
--------+--------
      2 | even
      4 | even
(2 rows)

However, we can also go back and query the table as it was just prior to the DELETE statement (2) using the time travel syntax

SELECT * FROM my_table('2022-10-25 11:22:00 +00:00');
 number | parity
--------+--------
      1 | odd
      2 | even
      3 | odd
      4 | even
(4 rows)

Write statement time-travel

Finally, note that you can also use the time travel syntax in source queries for some applicable write statements. In particular, this means you can do

CREATE TABLE my_new_table AS SELECT * FROM my_table('2022-10-25 11:22:00 +00:00')

or

INSERT INTO my_other_table SELECT * FROM my_table('2022-10-25 11:22:48 +00:00')

Current limitations

  • Can't use SQL expressions to specify the timestamp (e.g. now() - INTERVAL '5 hours').