Part 4 (option 2): Delivering query results globally with Cloudflare
Introduction
In this part of the tutorial, we'll put your Seafowl deployment behind Cloudflare CDN. This will let you use Cloudflare's edge network to cache and deliver query results globally.
You need to have a domain that's managed by Cloudflare for this to work. As an alternative, you can also set up Varnish in front of Seafowl.
Add a certificate on Fly.io
Go to the Fly.io dashboard1 for your application, for example, https://fly.io/apps/seafowl.
Click on "Certificates", then "Add certificate":
Enter a hostname for your application. For example, if you have a domain
yourdomain.io
, you can use seafowl.yourdomain.io
. Then, click "Create".
We now need to verify your domain ownership with Fly.io and direct visitors to Seafowl. Click on "View" next to your certificate. You should see verification instructions.
Set up the subdomain on Cloudflare
Go to your Cloudflare dashboard:
Add the required CNAME (ownership verification) and A/AAAA records. Make sure Cloudflare proxying is disabled for now (grey cloud)!
Finish the setup
Go back to Fly.io and click "Check again" on the certificate dialog. You should see green bullet points next to "Confirm domain ownership" and "Direct visitors to application". This normally takes effect instantly, but could take several minutes:
Finally, enable proxying on Cloudflare again:
Query Seafowl through Cloudflare
Use the query.sh
script from the previous part and pass
SEAFOWL_HOST=https://seafowl.yourdomain.io
to query Seafowl:
export SEAFOWL_HOST=https://demo.seafowl.io
./query.sh "SELECT name, SUM(value) AS sum FROM test_cache GROUP BY 1"
HTTP/2 200
...
cf-cache-status: DYNAMIC
server: cloudflare
cf-ray: 74248dc99aca54e1-MAN
{"name":"Alpha","sum":35.0}
{"name":"Beta","sum":60.0}
{"name":"Gamma","sum":10.0}
{"name":"Delta","sum":83.0}
{"name":"Kappa","sum":16.0}
That's weird. The cf-cache-status: DYNAMIC
header means Cloudflare doesn't
think this resource is cacheable.
This is because Cloudflare caches responses based on their extension by default. Files without an extension aren't cached at all.
We could use Cloudflare's Page Rules to get it to cache everything on our domain. Or, we could add one of these extensions to our URL. Let's do that instead.
Edit query.sh
Change the last line in query.sh
to request a query result with a .csv
extension. Seafowl ignores the extension and doesn't use it to determine the
response format, but it will get Cloudflare to cache Seafowl responses.
- curl -i "${headers[@]}" "$SEAFOWL_HOST/q/$hash"
+ curl -i "${headers[@]}" "$SEAFOWL_HOST/q/$hash.csv"
Run the query again
./query.sh "SELECT name, SUM(value) AS sum FROM test_cache GROUP BY 1"
...
cache-control: max-age=14400
cf-cache-status: MISS
server: cloudflare
cf-ray: 7424955fa85f35fb-MAN
{"name":"Alpha","sum":35.0}
{"name":"Beta","sum":60.0}
{"name":"Gamma","sum":10.0}
{"name":"Delta","sum":83.0}
{"name":"Kappa","sum":16.0}
That's better. You'll notice Cloudflare injected a Cache-Control
header of its
own into the response. This is for the downstream clients, like your user's
browser, to know how long to keep the result in the cache for before contacting
Cloudflare again.
The second time you run the query, its results will be cached by Cloudflare2:
cf-cache-status: HIT
age: 58
{"name":"Alpha","sum":35.0}
{"name":"Beta","sum":60.0}
{"name":"Gamma","sum":10.0}
{"name":"Delta","sum":83.0}
{"name":"Kappa","sum":16.0}
Test cache invalidation
Let's add more rows to the dataset. Note that we can't use the cached GET endpoint to perform writes:
curl -i \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $YOUR_PASSWORD" \
https://demo.seafowl.io/q [email protected]<<EOF
{"query": "INSERT INTO test_cache
VALUES
('Epsilon', 7.4, 49)
"}
EOF
Sadly, the new data won't propagate to Cloudflare immediately, as Seafowl
currently doesn't send Cache-Control
headers that can force Cloudflare to
always revalidate query results. If we were to wait for a few hours and rerun
the query, we'd see this3:
cf-cache-status: INVALIDATED
age: 0
{"name":"Epsilon","sum":49.0}
{"name":"Alpha","sum":35.0}
{"name":"Beta","sum":60.0}
{"name":"Gamma","sum":10.0}
{"name":"Delta","sum":83.0}
{"name":"Kappa","sum":16.0}
Next steps
It's time to upgrade from the command line to the browser. In the next part, we'll put everything we learned together to query Seafowl directly from the user's browser and build a beautiful dynamic visualization with Observable.
- Thanks to this blog post for the step-by-step guide.↩
- This is not reliable: you might hit a different Cloudflare server and get
another
MISS
.↩ - See
the Cloudflare reference
for all possible values of
CF-Cache-Status
.↩