Accelerating queries with a cache / CDN

Seafowl also provides a read-only query execution endpoint that follows HTTP cache semantics. This lets you cache Seafowl query results:

  • in the user's browser
  • using an HTTP cache like Varnish
  • using a CDN like Cloudflare or Fastly

An example is available on our GitHub.

To use the endpoint, you need to put the SHA-256 hash of the query in the GET URL, as well as the actual query as an X-Seafowl-Query header:

#!/bin/bash -e
SEAFOWL_HOST = "http://localhost:8080"
query=$1
hash=$(echo -n "$query" | sha256sum | cut -f 1 -d " ")

curl -v\
 -H "X-Seafowl-Query: ${query}" \
 "$SEAFOWL_HOST/q/$hash"

Note that the header value doesn't support newlines or non-printable / non-ASCII characters, use encodeURIComponent() to encode special characters in your queries. You can also pass the query as a GET request body (though that can be silently dropped by some clients, including the fetch() API):

jq -cn --arg q "$query" '{"query": $q}' | \
  curl -XGET \
  -H "Content-Type: application/json" \
  "$SEAFOWL_HOST/q/$hash" [email protected]

Seafowl will:

  • Make sure the query matches the hash (to avoid cache poisoning and because it can't recover the original query just from the hash)
  • Run the query
  • Return the result with an ETag that changes with the version of the dataset.

For example:

curl -H "X-Seafowl-Query: SELECT 'Hello, Seafowl'" \
 "http://localhost:8080/q/7468d08fa2c3a2dc59905511dc478784a550e2d4ad67d101802a6411099d690f"

{"Utf8(\"Hello, Seafowl\")":"Hello, Seafowl"}

Using the ETag for cache invalidation

You can pass the ETag back to Seafowl manually, performing cache revalidation using HTTP conditional requests.

This is mostly unnecessary if you are using the browser cache, an HTTP cache or a CDN (those handle conditional requests for you), but can be useful to debug caching.

curl -v \
  -H "X-Seafowl-Query: ${query}" \
  -H "If-None-Match: ${current_etag}"
  "$SEAFOWL_HOST/q/$hash"

Seafowl keeps track of all versions of all tables. If none of the tables in the query have been changed, Seafowl will return a 304 Not Modified response. Otherwise, it will re-execute the query and return the new result and the new ETag.

For example:

curl -i -H "X-Seafowl-Query: SELECT 'Hello, Seafowl'" \
  -H "If-None-Match: 4f53cda18c2baa0c0354bb5f9a3ecbe5ed12ab4d8e11ba873c2f11161202b945" \
 "http://localhost:8080/q/7468d08fa2c3a2dc59905511dc478784a550e2d4ad67d101802a6411099d690f"

HTTP/1.1 304 Not Modified


curl -i -H "X-Seafowl-Query: SELECT 'Hello, Seafowl'" \
  -H "If-None-Match: etag_changed" \
 "http://localhost:8080/q/7468d08fa2c3a2dc59905511dc478784a550e2d4ad67d101802a6411099d690f"

HTTP/1.1 200 OK
content-type: application/octet-stream
etag: 4f53cda18c2baa0c0354bb5f9a3ecbe5ed12ab4d8e11ba873c2f11161202b945
...

Querying from the browser using the fetch() API

In the "default" cache mode, the fetch() call handles the ETag and the If-None-Match headers for you. This means that by default, query results will be cached in the client's browser without you having to do anything.

async function executeQueryCached(sql, host = "https://localhost:8080") {
  // Transform the query to make it a valid header
  // NOTES:
  //   - this will break legitimate newlines in your SQL (in string literals)
  //   - this won't handle characters that aren't a valid HTTP header
  const query = sql.trim().replace(/(?:\r\n|\r|\n)/g, " ");

  // NOTE: crypto.subtle.digest only available in secure contexts like
  // https or localhost. As an alternative, use something like:
  //
  //    const sha256 = require('https://cdnjs.cloudflare.com/ajax/libs/js-sha256/0.9.0/sha256.js');
  //    const hash = sha256(query)
  const digest = await crypto.subtle.digest(
    "SHA-256",
    new TextEncoder("utf-8").encode(query)
  );
  const hash = [...new Uint8Array(digest)]
    .map((x) => x.toString(16).padStart(2, "0"))
    .join("");
  const response = await fetch(
    // Add an (ignored) extension to the URL to make Cloudflare treat the
    // query result as a static asset without using Page Rules.
    `${host}/q/${hash}.csv`,
    { headers: { "X-Seafowl-Query": query } }
  );
  const response_t = await response.text();
  return response_t ? response_t.trim().split("\n").map(JSON.parse) : [];
}

Putting Seafowl behind Varnish

The Varnish HTTP cache automatically handles the ETag header and sends conditional requests with If-None-Match.

See our GitHub for an example Docker Compose stack with Varnish and Seafowl that includes a sample VCL file.

Query results will then be cached automatically:

curl -v \
  -H "X-Seafowl-Query: ${query}" \
  "$VARNISH_HOST/q/$hash"

Putting Seafowl behind Cloudflare CDN

By default, Cloudflare only caches files with certain extensions and treats files without an extension as a dynamic resource.

Seafowl supports passing an optional extension to the GET request path. This extension is discarded, but can be used to trick Cloudflare into treating the resource as static and cacheable.

curl -v \
  -H "X-Seafowl-Query: ${query}" \
  "$DOMAIN_BEHIND_CF/q/$hash.csv"

Alternatively, you can add a Cloudflare page rule to cache Seafowl responses.

To verify the caching works as expected, make a request to Seafowl and note the Cloudflare response header. CF-Cache-Status: DYNAMIC means that the resource is uncacheable by Cloudflare. CF-Cache-Status: MISS or HIT means that the resource is cacheable.

Bypassing / busting the cache

You currently can't reconfigure a Seafowl instance to alter the ETag generation algorithm, change the returned cache headers (including adding different Cache-Control headers) or invalidate the HTTP cache in some other way.

You can query the uncached endpoint instead or bust the cache by adding an immaterial modification to your query. For example:

-- Comments are still included in the hash
-- so we can include a random value to bust the cache
-- abcdef12345
SELECT 1;

Note on authorization

If you disable Seafowl reads or make them password-protected by setting read = "password"/"off", this will also disable the cached GET endpoint.