Accelerating queries with a cache / CDN
Seafowl also provides a read-only query execution endpoint that follows HTTP cache semantics. This lets you cache Seafowl query results:
- in the user's browser
- using an HTTP cache like Varnish
- using a CDN like Cloudflare or Fastly
An example is available on our GitHub.
Having in mind that the URL is the main factor contributing to the HTTP response cache key, Seafowl provides a couple of ways to use this endpoint:
Query in the URL path
Simply put the URL-encoded query as the final path parameter:
$ curl "http://localhost:8080/q/$query"
This is the recommended approach, chiefly because it's less involved than the alternative:
Query in the body or header
This approach requires that you supply the SHA-256 hash of the query in the GET URL, and then either:
Pass the actual query as an
X-Seafowl-Query
header:#!/bin/bash -e SEAFOWL_HOST = "http://localhost:8080" query=$1 hash=$(echo -n "$query" | sha256sum | cut -f 1 -d " ") curl -v\ -H "X-Seafowl-Query: ${query}" \ "$SEAFOWL_HOST/q/$hash"
Note that the header value doesn't support newlines or non-printable / non-ASCII characters, so use
encodeURIComponent()
to encode special characters in your queries.Otherwise, you can pass the query as a GET request body (though that can be silently dropped by some clients, including the
fetch()
API):jq -cn --arg q "$query" '{"query": $q}' | \ curl -XGET \ -H "Content-Type: application/json" \ "$SEAFOWL_HOST/q/$hash" -d@-
Seafowl will then:
- (Optional) Make sure the query matches the hash if it's provided. This is done so to avoid cache poisoning and because it can't recover the original query just from the hash.
- Run the query
- Return the result with an ETag that changes with the version of the dataset.
For example, to execute SELECT 'Hello, Seafowl'
using the first method above
run:
$ curl http://localhost:8080/q/SELECT%20\'Hello,%20Seafowl\'
{"Utf8(\"Hello, Seafowl\")":"Hello, Seafowl"}
Using the ETag for cache invalidation
You can pass the ETag back to Seafowl manually, performing cache revalidation using HTTP conditional requests.
This is mostly unnecessary if you are using the browser cache, an HTTP cache or a CDN (those handle conditional requests for you), but can be useful to debug caching.
curl -v \
-H "If-None-Match: ${current_etag}" \
"$SEAFOWL_HOST/q/$query"
Seafowl keeps track of all versions of all tables. If none of the tables in the query have been changed, Seafowl will return a 304 Not Modified response. Otherwise, it will re-execute the query and return the new result and the new ETag.
For example:
curl -i -H "If-None-Match: 4f53cda18c2baa0c0354bb5f9a3ecbe5ed12ab4d8e11ba873c2f11161202b945" \
"http://localhost:8080/q/SELECT%20'Hello,%20Seafowl'"
HTTP/1.1 304 Not Modified
curl -i -H "If-None-Match: etag_changed" \
"http://localhost:8080/q/SELECT%20'Hello,%20Seafowl'"
HTTP/1.1 200 OK
content-type: application/octet-stream
etag: 4f53cda18c2baa0c0354bb5f9a3ecbe5ed12ab4d8e11ba873c2f11161202b945
...
Querying from the browser using the fetch()
API
In the
"default" cache mode,
the fetch()
call handles the ETag
and the If-None-Match
headers for you.
This means that by default, query results will be cached in the client's browser
without you having to do anything.
async function executeQueryCached(sql, host = "https://localhost:8080") {
// Transform the query to make it a valid header
// NOTES:
// - this will break legitimate newlines in your SQL (in string literals)
// - this won't handle characters that aren't a valid HTTP header
const query = sql.trim().replace(/(?:\r\n|\r|\n)/g, " ");
// NOTE: crypto.subtle.digest only available in secure contexts like
// https or localhost. As an alternative, use something like:
//
// const sha256 = require('https://cdnjs.cloudflare.com/ajax/libs/js-sha256/0.9.0/sha256.js');
// const hash = sha256(query)
const digest = await crypto.subtle.digest(
"SHA-256",
new TextEncoder("utf-8").encode(query)
);
const hash = [...new Uint8Array(digest)]
.map((x) => x.toString(16).padStart(2, "0"))
.join("");
const response = await fetch(
// Add an (ignored) extension to the URL to make Cloudflare treat the
// query result as a static asset without using Page Rules.
`${host}/q/${hash}.csv`,
{ headers: { "X-Seafowl-Query": query } }
);
const response_t = await response.text();
return response_t ? response_t.trim().split("\n").map(JSON.parse) : [];
}
Putting Seafowl behind Varnish
The Varnish HTTP cache automatically handles the
ETag
header and sends conditional requests with If-None-Match
.
See our GitHub for an example Docker Compose stack with Varnish and Seafowl that includes a sample VCL file.
Query results will then be cached automatically:
curl -v "$VARNISH_HOST/q/$query"
Putting Seafowl behind Cloudflare CDN
By default, Cloudflare only caches files with certain extensions and treats files without an extension as a dynamic resource.
Seafowl supports passing an optional extension to the GET request path. This extension is discarded, but can be used to trick Cloudflare into treating the resource as static and cacheable.
curl -v "$DOMAIN_BEHIND_CF/q/$query.csv"
Alternatively, you can add a Cloudflare page rule to cache Seafowl responses.
To verify the caching works as expected, make a request to Seafowl and note the
Cloudflare response header. CF-Cache-Status: DYNAMIC
means that the resource
is uncacheable by Cloudflare. CF-Cache-Status: MISS
or HIT
means that the
resource is cacheable.
Bypassing / busting the cache
You currently can't reconfigure a Seafowl instance to alter the ETag generation
algorithm, change the returned cache headers (including adding different
Cache-Control
headers) or invalidate the HTTP cache in some other way.
You can query the uncached endpoint instead or bust the cache by adding an immaterial modification to your query. For example:
-- Comments are still included in the hash
-- so we can include a random value to bust the cache
-- abcdef12345
SELECT 1;
Note on authorization
Disabling Seafowl reads (read = "off"
) will also disable the cached GET
endpoint. On the other hand, making the reads password-protected
(read = "my_password_hash"
) will enforce request authorization via the usual
Authorization: Bearer (my_password)
header.