seafowl.toml configuration

Using environment variables

Seafowl supports sourcing configuration values from environment variables.

The environment variable format is: SEAFOWL__[section]__[section]__[key]=value. The key or section names are separated by a double underscore __. Dots in names must also be replaced with a double underscore.

Environment variables take precedence over the config file.

For example: SEAFOWL__FRONTEND__HTTP__WRITE_ACCESS=off is equivalent to setting the configuration parameter frontend.http.write_access=off.

`object_store` section

This section contains the configuration for the object store used by Seafowl to store data.

Select the object store by setting a type=... parameter and configure it by adding extra fields for the specific flavor.

`type = "local"`

Default. Store data files on the local filesystem.

`data_dir`

The directory to store data files in. Default ./seafowl-data.

`type = "memory"`

Store the data in RAM. This does not support any other parameters.

Note that when using this option, restarting the process will lose all data. In addition, combining an in-memory catalog with an persistent object store (or vice versa) will lead to consistency issues.

`type = "s3"`

Store data files in S3-compatible object storage such as S3 itself, MinIO, Cloudflare R2 etc.

⚠️ NOTE: If you're using actual AWS S3, do not specify endpoint, please specify only region.

`region`

AWS S3 region. Optional.

`access_key_id`

AWS access key ID. Required.

`secret_access_key`

AWS secret access key. Required.

`endpoint`

Service endpoint for storage, for Minio or other S3-like APIs. If using S3 itself, use the region parameter instead. Optional.

Example: https://localhost:9000

`bucket`

Name of the S3 bucket. Required.

`type = "gcs"`

Store data files in a GCS bucket.

`bucket`

Name of the GCS bucket. Required.

`google_application_credentials`

Path to the GCP JSON credentials file. Optional, the credentials can be sourced from the env var GOOGLE_APPLICATION_CREDENTIALS, or the metadata server in case of GCP VMs.

`object_store.cache_properties` section

This is an optional sub-section for the S3 object store, which enables caching of fetched object byte ranges. In addition, it performs range coalescing, by enforcing a minimum byte range threshold for fetching.

It stores the actual contents of the cached entries in a temporary directory on the local file system.

`capacity`

Maximum size of all objects in the cache. Defaults to 512 MB.

`min_fetch_size`

Determines the minimum range size for a byte fetch request. Defaults to 2MB.

`ttl_s`

Time-to-live for the entries in the cache. Defaults to 3 minutes.

`catalog` section

This section contains the configuration for the catalog used by Seafowl to store metadata (table names and mappings to partitions, index for partition pruning, UDF definitions etc).

Select the catalog by setting a type=... parameter and configure it by adding extra fields for the specific flavor.

`type = "sqlite"`

Default. Store the catalog in a local SQLite file.

`dsn`

Path to the SQLite file or the connection string. Default ./seafowl-data/seafowl.sqlite.

You can use :memory: here to use an in-memory SQLite database. Note that when using this option, restarting the process will lose all data. In addition, combining an in-memory catalog with an persistent object store (or vice versa) will lead to consistency issues.

`journal_mode`

Journal mode used by SQLite. Default wal. One of delete, truncate, persist, memory, wal, off. See the SQLite documentation for more information.

journal_mode = 'delete' is required to make a Seafowl instance work against LiteFS as a leader (since it doesn't support wal).

`read_only`

Open the SQLite database in read-only mode. Using journal_mode = 'off' and read_only = true is required to make a Seafowl instance work against a LiteFS replica.

`type = "postgres"`

Store the catalog in a PostgreSQL database.

`dsn`

Connection URI to the PostgreSQL database, in the format postgresql://[user[:password]@][[host][:port][,...]][/dbname][name=value[&...]]

Example: postgresql://user:secret@localhost

`frontend.http` section

This section contains the configuration for the HTTP frontend used to query Seafowl from Web applications. Omit this section to disable the HTTP frontend altogether.

`write_access`

Settings for write access to Seafowl (execution of any non-SELECT/EXPLAIN queries). This can be either any (anyone can write), off (disabled) or a SHA-256 hash of a password.

By default, Seafowl will generate and write a password hash to this section (as well as the actual password in the logs) once when it starts up without detecting a config file.

If a config file already exists and this is omitted, it defaults to off.

To generate a new password, you can use this Bash snippet:

pw=$(< /dev/urandom LC_ALL=C tr -dc A-Za-z0-9 | head -c${1:-32};echo -n)
pw_hash=$(echo -n $pw | sha256sum - | head -c 64)
echo -e "Password: $pw\nHash: $pw_hash"

`read_access`

Settings for read access to Seafowl (execution of SELECT/EXPLAIN queries). This can be either any (anyone can read), off (disabled) or a SHA-256 hash of a password. By default, this is set to any.

The read password can be different from the write password.

`bind_host`

IP address to bind the HTTP frontend to. Default 127.0.0.1. To expose Seafowl to other machines on the network, use 0.0.0.0 here.

`bind_port`

Port for the HTTP frontend. Default 8080.

`upload_data_max_length`

Maximum size (in MB) of uploads to Seafowl's /upload endpoint. Default 2MB. Note that Seafowl currently keeps the whole uploaded file in memory, making the upload endpoint unsuitable for memory-constrained environments.

`cache_control`

The directives set as Cache-Control header value for the cached GET endpoint. Optional, defaults to max-age=43200, public.

`frontend.postgres` section

This section contains the configuration for the PostgreSQL frontend used to query Seafowl by PostgreSQL clients. This endpoint doesn't support authentication or encryption and should only be used in development.

By default, this section is omitted and disabled.

`bind_host`

IP address to bind the PostgreSQL frontend to. Default 127.0.0.1. To expose Seafowl to other machines on the network, use 0.0.0.0 here.

`bind_port`

Port for the PostgreSQL frontend. Default 6432.

`misc` section

Miscellaneous Seafowl configuration.

`max_partition_size`

Maximum length (in rows) of a Parquet file (partition) to produce when writing Seafowl tables. Default 1048576 (1024x1024).

For more information on partitioning, see the learning section.

`gc_interval`

Interval (in hours) at which a cron task will run garbage collection of orphan partitions (effectively invoking VACUUM PARTITIONS).

Default is 0 (i.e. the task is not run at all).

`runtime` section

Various configuration settings related to executing queries.

`max_memory`

Guideline for the maximum amount of RAM (in MB) for DataFusion to use when executing queries, spilling data to disk during operations where there isn't enough memory. Note that DataFusion currently doesn't always respect this amount and it's not a guaranteed maximum RAM cap.

Default unlimited.

`temp_dir`

Override the temporary directory used to spill files during execution when DataFusion reaches the memory limit.

Splitgraph has been acquired by EDB! Read the blog post.

seafowl.toml configuration

Table of contents

Product

Support

Company

Community

Splitgraph