DDL queries
CREATE TABLE
CREATE TABLE [schema_name].table_name (
col_1 type_1,
...
)
Create an empty table.
CREATE TABLE AS
CREATE TABLE [schema_name].table_name AS (
SELECT ...
)
Create a table from the results of a SELECT
query.
CREATE SCHEMA
CREATE SCHEMA schema_name
Create a schema.
DROP TABLE
DROP TABLE [schema_name].table_name
Delete a table.
If the table doesn't exist, this won't do anything (equivalent to IF EXISTS
).
Note that this can lead to orphan partitions.
DROP SCHEMA
DROP SCHEMA [schema_name]
Delete a schema and all tables in it. This is equivalent to
DROP SCHEMA ... CASCADE
in other SQL implementations.
If the schema doesn't exist, this won't do anything (equivalent to IF EXISTS
).
ALTER TABLE ... RENAME TO
ALTER TABLE [schema_name].table_name RENAME TO [new_schema_name].new_table_name
Rename a table. If new_schema_name
is passed, this will also move the table to
a different schema. The target schema must already exist.
VACUUM
command
VACUUM { PARTITIONS | TABLE [schema_name.]table_name | DATABASE database_name }
A custom cleanup utility to remove obsolete partitions/tables:
PARTITIONS
: Seafowl finds all orphan partitions (i.e. ones not referenced by any table version), and tries to delete them from the object store and catalog.TABLE
: deletes all table versions apart from the latest ones for the specified table.DATABASE database_name
: delete remnant directories from dropped tables.
CREATE FUNCTION
CREATE FUNCTION function_name AS '{
"entrypoint": "sintau",
"language": "wasm",
"input_types": ["f32"],
"return_type": "f32",
"data": "AGFzbQEAAAABDQJgAX0BfWADfX9..."
}'
Create a user-defined function.
The function_definition
must be a JSON object with the following mandatory
fields:
entrypoint
: name of the function inside the WASM bytecode to executeinput_types
: list of argument types the entrypoint acceptsreturn_type
: the return type of the functiondata
: Base64-encoded WASM bytecode
The optional fields are:
volatility
: defaultvolatile
. One of:immutable
: given the same input, the function always returns the same outputstable
: within a single query, given the same input, the function always returns the same outputvolatile
: the function's results may change between evaluations within the same query
language
: defaultwasm
. Onlywasm
is currently supported.
See the guide for more information on writing UDFs for Seafowl.
Supported types
Seafowl UDFs only support scalar integer/float types: f32
, f64
, i32
,
i64
. Other types, such as strings, are currently unsupported.
CREATE EXTERNAL TABLE
CREATE EXTERNAL TABLE [ IF NOT EXISTS ] table_name [(
col_1 type_1,
...
)]
STORED AS (CSV|PARQUET|TABLE|DELTATABLE)
[WITH
[HEADER ROW]
[DELIMITER ',']
]
[ OPTIONS ( 'option' 'value' [, ... ] ) ]
LOCATION 'location_url'
CSV/Parquet files
In this case location_url
supported format is [scheme://](url)
:
- (no prefix): a file or a directory on the local filesystem.
http
/https
: HTTP/HTTPS URL.
This is similar to DataFusion's vanilla
CREATE EXTERNAL TABLE
statement
but with the addition of the HTTP scheme to allow you to read remote files.
Querying data on HTTP requires the server to support byte fetches using the Range header. When querying data on HTTP, Seafowl will partially cache its fragments with a chunk size of 2MB and a maximum cache size of 512MB. These settings currently can't be changed.
When using the CSV format, the schema of the table is mandatory. You can also
optionally pass a WITH HEADER ROW
/ WITH DELIMITER ','
clause to specify the
format of the CSV file.
Remote tables
In the case an external table represents an actual table on a remote Database
system (... STORED AS TABLE ...
variant), we refer to this subtype of external
tables as remote tables. The location_url
should
then be an adequate connection string for the remote database at hand (e.g.
postgres://{user}:{password}@{hostname}:{port}/{database}
), while the
OPTIONS
clause should include a name
option denoting the remote table's
(potentially fully qualified) raw name. If the remote table name has spaces or
special characters the identifier should be escaped using double quotes (e.g.
"my table"
). The supported remote DBs for the time being include Postgres,
MySQL and SQLite.
Delta tables
The ... STORED AS DELTATABLE ...
variant allows for configuring an external
Delta table, stored outside of Seafowl's
object store. You can point to a local Delta table by simply providing the path
to the folder storing the table with the LOCATION
clause. If pointing to a
remote Delta table, for instance ones stored on AWS S3, you'll need to provide
the connection parameters using the OPTIONS
clause (namely
AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
and AWS_REGION
), as well as an S3
URL pointing to the bucket/folder of the table in the LOCATION
clause.
External tables are ephemeral
Note that external tables are created in a dedicated staging
schema that can't
be manipulated by other Seafowl statements and doesn't get persisted. That means
external tables get destroyed when a Seafowl instance gets restarted and should
only be used for staging data before ingesting it using a CREATE TABLE AS
statement.
Notable unsupported statements and workarounds
Altering/renaming a table's columns
This includes schema changes like
ALTER TABLE ... ADD COLUMN | ALTER COLUMN | RENAME COLUMN | DROP COLUMN
.
Seafowl currently doesn't support these.
You can emulate these by using a CREATE TABLE AS
statement (which will rewrite
the whole table):
CREATE TABLE new_table AS (
SELECT
column_1,
column_2 AS column_2_renamed,
CAST(column_3 AS INT) AS column_3,
-- column_4 (deleted)
'default_value' AS column_5
FROM old_table
);
DROP TABLE old_table;
ALTER TABLE new_table RENAME TO old_table;
Renaming a schema
You can't rename schemas. As a workaround, you can move all tables in your old schema to a new schema and then delete the old schema:
CREATE SCHEMA new_schema;
ALTER TABLE old_schema.old_table RENAME TO new_schema.old_table;
-- repeat for other tables in old_schema
DROP SCHEMA old_schema;