Splitgraph has been acquired by EDB! Read the blog post.

DDL queries

CREATE TABLE

CREATE TABLE [schema_name].table_name (
  col_1 type_1,
  ...
)

Create an empty table.

CREATE TABLE AS

CREATE TABLE [schema_name].table_name AS (
  SELECT ...
)

Create a table from the results of a SELECT query.

CREATE SCHEMA

CREATE SCHEMA schema_name

Create a schema.

DROP TABLE

DROP TABLE [schema_name].table_name

Delete a table.

If the table doesn't exist, this won't do anything (equivalent to IF EXISTS). Note that this can lead to orphan partitions.

DROP SCHEMA

DROP SCHEMA [schema_name]

Delete a schema and all tables in it. This is equivalent to DROP SCHEMA ... CASCADE in other SQL implementations.

If the schema doesn't exist, this won't do anything (equivalent to IF EXISTS).

ALTER TABLE ... RENAME TO

ALTER TABLE [schema_name].table_name RENAME TO [new_schema_name].new_table_name

Rename a table. If new_schema_name is passed, this will also move the table to a different schema. The target schema must already exist.

VACUUM command

VACUUM { PARTITIONS | TABLE [schema_name.]table_name | DATABASE database_name }

A custom cleanup utility to remove obsolete partitions/tables:

  • PARTITIONS: Seafowl finds all orphan partitions (i.e. ones not referenced by any table version), and tries to delete them from the object store and catalog.
  • TABLE: deletes all table versions apart from the latest ones for the specified table.
  • DATABASE database_name: delete remnant directories from dropped tables.

CREATE FUNCTION

CREATE FUNCTION function_name AS '{
  "entrypoint": "sintau",
  "language": "wasm",
  "input_types": ["f32"],
  "return_type": "f32",
  "data": "AGFzbQEAAAABDQJgAX0BfWADfX9..."
}'

Create a user-defined function.

The function_definition must be a JSON object with the following mandatory fields:

  • entrypoint: name of the function inside the WASM bytecode to execute
  • input_types: list of argument types the entrypoint accepts
  • return_type: the return type of the function
  • data: Base64-encoded WASM bytecode

The optional fields are:

  • volatility: default volatile. One of:
    • immutable: given the same input, the function always returns the same output
    • stable: within a single query, given the same input, the function always returns the same output
    • volatile: the function's results may change between evaluations within the same query
  • language: default wasm. Only wasm is currently supported.

See the guide for more information on writing UDFs for Seafowl.

Supported types

Seafowl UDFs only support scalar integer/float types: f32, f64, i32, i64. Other types, such as strings, are currently unsupported.

CREATE EXTERNAL TABLE

CREATE EXTERNAL TABLE [ IF NOT EXISTS ] table_name [(
  col_1 type_1,
  ...
)]
STORED AS (CSV|PARQUET|TABLE|DELTATABLE)
[WITH
  [HEADER ROW]
  [DELIMITER ',']
]
[ OPTIONS ( 'option' 'value' [, ... ] ) ]
LOCATION 'location_url'

CSV/Parquet files

In this case location_url supported format is [scheme://](url):

  • (no prefix): a file or a directory on the local filesystem.
  • http/https: HTTP/HTTPS URL.

This is similar to DataFusion's vanilla CREATE EXTERNAL TABLE statement but with the addition of the HTTP scheme to allow you to read remote files.

Querying data on HTTP requires the server to support byte fetches using the Range header. When querying data on HTTP, Seafowl will partially cache its fragments with a chunk size of 2MB and a maximum cache size of 512MB. These settings currently can't be changed.

When using the CSV format, the schema of the table is mandatory. You can also optionally pass a WITH HEADER ROW / WITH DELIMITER ',' clause to specify the format of the CSV file.

Remote tables

In the case an external table represents an actual table on a remote Database system (... STORED AS TABLE ... variant), we refer to this subtype of external tables as remote tables. The location_url should then be an adequate connection string for the remote database at hand (e.g. postgres://{user}:{password}@{hostname}:{port}/{database}), while the OPTIONS clause should include a name option denoting the remote table's (potentially fully qualified) raw name. If the remote table name has spaces or special characters the identifier should be escaped using double quotes (e.g. "my table"). The supported remote DBs for the time being include Postgres, MySQL and SQLite.

Delta tables

The ... STORED AS DELTATABLE ... variant allows for configuring an external Delta table, stored outside of Seafowl's object store. You can point to a local Delta table by simply providing the path to the folder storing the table with the LOCATION clause. If pointing to a remote Delta table, for instance ones stored on AWS S3, you'll need to provide the connection parameters using the OPTIONS clause (namely AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_REGION), as well as an S3 URL pointing to the bucket/folder of the table in the LOCATION clause.

External tables are ephemeral

Note that external tables are created in a dedicated staging schema that can't be manipulated by other Seafowl statements and doesn't get persisted. That means external tables get destroyed when a Seafowl instance gets restarted and should only be used for staging data before ingesting it using a CREATE TABLE AS statement.

Notable unsupported statements and workarounds

Altering/renaming a table's columns

This includes schema changes like ALTER TABLE ... ADD COLUMN | ALTER COLUMN | RENAME COLUMN | DROP COLUMN. Seafowl currently doesn't support these.

You can emulate these by using a CREATE TABLE AS statement (which will rewrite the whole table):

CREATE TABLE new_table AS (
  SELECT
    column_1,
    column_2 AS column_2_renamed,
    CAST(column_3 AS INT) AS column_3,
    -- column_4 (deleted)
    'default_value' AS column_5
  FROM old_table
);
DROP TABLE old_table;
ALTER TABLE new_table RENAME TO old_table;

Renaming a schema

You can't rename schemas. As a workaround, you can move all tables in your old schema to a new schema and then delete the old schema:

CREATE SCHEMA new_schema;
ALTER TABLE old_schema.old_table RENAME TO new_schema.old_table;
-- repeat for other tables in old_schema
DROP SCHEMA old_schema;