Uploading a CSV / Parquet file

Seafowl has an endpoint that you can use to upload CSV files and Parquet tables as a standard multipart/form-data upload.

curl -v \
  -H "Authorization: Bearer 2Ux0FMpIifxS4EQVxvBhyBQl9EfZ0Cq1" \
  -F "data=@path/to/file.parquet" \
  http://localhost:8080/upload/[schema_name]/[table_name]

The /upload endpoint follows the same authorization rules and configuration as a standard write to Seafowl. See the HTTP endpoint guide for more information on configuring it and setting up a password.

Special options for CSV files

When it comes to CSV files, Seafowl will try to infer the schema of the data automatically. If the inherent ambiguity of this leads to unsatisfactory results you can always make the schema explicit by passing an extra form-data parameter specifying the Arrow schema JSON representation:

curl -v \
  -H "Authorization: Bearer 2Ux0FMpIifxS4EQVxvBhyBQl9EfZ0Cq1" \
  -F 'schema={
    "fields": [
        {
            "name": "some_number",
            "type": {"name": "int", "isSigned": true, "bitWidth": 32},
            "nullable": true,
            "children": []
        },
        {
            "name": "some_name",
            "type": {"name": "utf8"},
            "nullable": true,
            "children": []
        }
    ]
  }' \
  http://localhost:8080/upload/[schema_name]/[table_name] \
  -F "data=@path/to/file.csv"

If, on the other hand, specifying the schema explicitly turns out to be too laborious, you can instead use the table with the inferred schema as source table when declaring a new table while recasting and renaming the columns:

CREATE TABLE actual_data AS
SELECT (
  column_1::int AS some_number,
  column_2::varchar AS some_name
) FROM staging_table

In addition, Seafowl assumes by default that headers are present in the file; if not you'll need to specify this explicitly through another parameter with -F "has_headers=false".

Alternative: `CREATE EXTERNAL TABLE`

If your file is hosted somewhere else that's accessible by Seafowl, you can create an external table and then store that table in Seafowl using CREATE TABLE AS. For example (data from here):

CREATE EXTERNAL TABLE data
  STORED AS PARQUET
  LOCATION 'https://parqueth-sample.s3.us-west-1.amazonaws.com/mainnet/transactions/dt=2021-07-01/blocks-0012738509-0012739509.parquet';
CREATE TABLE parqueth_sample AS SELECT * FROM staging.data;

Splitgraph has been acquired by EDB! Read the blog post.

Uploading a CSV / Parquet file

Special options for CSV files

Alternative: `CREATE EXTERNAL TABLE`

Table of contents

Product

Support

Company

Splitgraph

Splitgraph has been acquired by EDB! Read the blog post.

Uploading a CSV / Parquet file

Special options for CSV files

Alternative: CREATE EXTERNAL TABLE

Table of contents

Product

Support

Company

Community

Splitgraph

Alternative: `CREATE EXTERNAL TABLE`