Splitgraph has been acquired by EDB! Read the blog post.

Part 4 (option 1): Caching with Varnish

Introduction

In this part of the tutorial, you'll deploy Varnish in front of Seafowl.

Disable HTTP-to-HTTPS

Varnish only supports proxying to HTTP servers. Edit the fly.toml file of your Seafowl deployment and delete force_https = true in [[services.ports]]:

  [[services.ports]]
    # force_https = true   <-- delete or comment this out
    handlers = ["http"]
    port = 80

Redeploy the app:

fly deploy

Deploy Varnish to Fly.io

With the free tier, Fly.io gives us two VMs with 256MB of RAM each. Let's use the second one to deploy Varnish, a HTTP cache, in front of our Seafowl instance1.

Make a new directory for your Fly project:

mkdir seafowl-varnish && cd seafowl-varnish

Create a Dockerfile for Varnish (make sure to replace seafowl.fly.dev with your own host!):

# syntax=docker/dockerfile:1

FROM varnish:7.1

USER root
RUN cat > /etc/varnish/default.vcl <<EOF
vcl 4.0;
backend default {
  # Replace with your own host!
  .host = "seafowl.fly.dev";
  .port = "80";
}

# Use a 5s TTL, which means that beyond that, Varnish will revalidate our
# query results with a conditional request. You can also use different
# TTL and Grace values.
sub vcl_backend_response {
  set beresp.ttl = 5s;
  set beresp.grace = 1h;
  if (beresp.status >= 400) {
    set beresp.ttl = 0s;
    set beresp.grace = 0s;
  }
}

# Add some debug headers to see what Varnish is doing
sub vcl_recv {
  unset req.http.x-cache;
}

sub vcl_hit {
    set req.http.x-cache = "hit";
}

sub vcl_miss {
    set req.http.x-cache = "miss";
}

sub vcl_pass {
    set req.http.x-cache = "pass";
}

sub vcl_pipe {
    set req.http.x-cache = "pipe uncacheable";
}

sub vcl_synth {
    set req.http.x-cache = "synth synth";
    set resp.http.x-cache = req.http.x-cache;
}

sub vcl_deliver {
    if (obj.uncacheable) {
        set req.http.x-cache = req.http.x-cache + " uncacheable" ;
    } else {
        set req.http.x-cache = req.http.x-cache + " cached" ;
    }
    set resp.http.x-cache = req.http.x-cache;
}
EOF
USER varnish

# Override the port to be 8080 in line with the default fly.toml
# Also give 256MB to the cache to fit in Fly.io's limits
ENTRYPOINT ["varnishd", "-F", "-f", "/etc/varnish/default.vcl", "-a", "http=:8080,HTTP", "-s", "malloc,256M", "-T", "none"]

Create and deploy your project:

$ fly launch

Scanning source code
Detected a Dockerfile app
? App Name (leave blank to use an auto-generated name): seafowl-varnish
? Select organization: Seafowl
? Select region: lhr (London, United Kingdom)
Created app seafowl-varnish in organization personal
Wrote config file fly.toml
? Would you like to setup a Postgresql database now? No
? Would you like to deploy now? Yes
Deploying seafowl-varnish
==> Validating app configuration
...
==> Creating build context
...
==> Building image with Docker
...
==> Pushing image to fly
...
==> Creating release
--> release v1 created
...
==> Monitoring deployment
 1 desired, 1 placed, 1 healthy, 0 unhealthy [health checks: 1 total]
--> v1 deployed successfully

Query Seafowl through Varnish

Use the query.sh script from the previous part and pass SEAFOWL_HOST=your_varnish_app to query Seafowl:

export SEAFOWL_HOST=https://seafowl-varnish.fly.dev
./query.sh "SELECT name, SUM(value) AS sum FROM test_cache GROUP BY 1"

...
age: 0
accept-ranges: bytes
x-cache: miss cached

{"name":"Delta","sum":83.0}
{"name":"Kappa","sum":16.0}
{"name":"Alpha","sum":35.0}
{"name":"Beta","sum":60.0}
{"name":"Gamma","sum":10.0}

The Varnish instance that we deployed has a TTL (beresp.ttl) of 5 seconds and a grace period (beresp.grace) of 1 hour2. This means that, if we run the same query again, the first time it'll serve a cached response while revalidating it with Seafowl using the ETag mechanism. The second time, it'll serve the revalidated query.

We now don't need to worry about ETags: Varnish handles them for us.

First time:

./query.sh "SELECT name, SUM(value) AS sum FROM test_cache GROUP BY 1"

...
age: 33
accept-ranges: bytes
x-cache: hit cached

{"name":"Delta","sum":83.0}
{"name":"Kappa","sum":16.0}
{"name":"Alpha","sum":35.0}
{"name":"Beta","sum":60.0}
{"name":"Gamma","sum":10.0}

Second time (note the small age value):

age: 3
accept-ranges: bytes
x-cache: hit cached

{"name":"Delta","sum":83.0}
{"name":"Kappa","sum":16.0}
{"name":"Alpha","sum":35.0}
{"name":"Beta","sum":60.0}
{"name":"Gamma","sum":10.0}

Test cache invalidation

Let's add more rows to the dataset. Note that we can't use the cached GET endpoint to perform writes:

curl -i \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $YOUR_PASSWORD" \
https://seafowl-varnish.fly.dev/q -d@-<<EOF
{"query": "INSERT INTO test_cache
  VALUES
    ('Epsilon', 7.4, 49)
"}
EOF

And run the query again. The first time, we'll get a stale response:

./query.sh "SELECT name, SUM(value) AS sum FROM test_cache GROUP BY 1"

...
age: 163
accept-ranges: bytes
x-cache: hit cached

{"name":"Alpha","sum":35.0}
{"name":"Beta","sum":60.0}
{"name":"Gamma","sum":10.0}
{"name":"Delta","sum":83.0}
{"name":"Kappa","sum":16.0}

The second time, the result will be updated:

age: 3
accept-ranges: bytes
x-cache: hit cached

{"name":"Epsilon","sum":49.0}
{"name":"Alpha","sum":35.0}
{"name":"Beta","sum":60.0}
{"name":"Gamma","sum":10.0}
{"name":"Delta","sum":83.0}
{"name":"Kappa","sum":16.0}

Next steps

It's time to upgrade from the command line to the browser. In the next part, we'll put everything we learned together to query Seafowl directly from the user's browser and build a beautiful dynamic visualization with Observable.


  1. This part of the tutorial is partially taken from Fly.io's Varnish tutorial
  2. More on Varnish TTLs and the grace period here.