Skip to content


Now on mkdocs-material

A photo from a hotel in Manchester, of a new tower construction nearby in the foreground with skyline in the background

I started with a custom Gatsby site, then switched to Hugo (which I didn't write about). Last weekend, I switched again to mkdocs. Am I addicted to fiddling and changing stuff? Well, maybe... but each of those changes happened because of problems or concerns I had. I hope that mkdocs and mkdocs-material will be my home for a while. Pull up a seat and let's take a look at how and why I ended up here.

Irresponsible Expertise - Python Packages

Are we experts teaching safe computing? Or are we empowering the less-experienced without informing about the risks and responsibilities? I suspected the latter but had no evidence to back it up. I've tried to run a quick experiment as impartially as I can to see what the evidence suggests.

Part of an exploration of supply chain security.

Why I Automated My Laptop Build

I've invested a fair bit of time over the last few years incrementally automating my laptop build. Now, I've got to a point where I can reliably wipe, rebuild, and pick up working where I left off in under thirty minutes. This post explains why I've invested that time.

  • Thanks to for supporting this content.

Materialized UDFs in a dbt World

As part of my work on the PyPI downloads dataset, I needed a way of matching package versions to vulnerability report ranges. I didn't find a solution I trusted, so I implemented a solution from spec with decent test coverage and CI/CD automation in user defined functions (UDFs). This post covers a novel approach to incorporate UDFs into the dbt ecosystem that is working really well for me - treating UDFs as dbt models with custom materialization.

  • Thanks to for supporting this content.

The BigQuery Safety Net

Last time, I said:

[BigQuery] doesn't offer a "don't bankrupt me without asking first" setting.

After further work, I find that's not true! This setting is available in the UI, just a bit tricky to find. More importantly, there's another set of controls elsewhere that you need to know about if you want to use BigQuery safely.

  • Thanks to for supporting this content.

$1,370 Gone in Sixty Seconds

Given a historical view of the Safety DB, I need to get a historical view of PyPI downloads. The source dataset contains 274TB at time of writing, and grows every day - one naive query against that table could cost $1,370. A query over a single day can scan hundreds of GB, so I need to do some work to make it usable.

  • Thanks to for supporting this content.

Analysing PyPI Downloads

Investigating Python package downloads with the public PyPI downloads dataset and Safety public database. This post covers how I've prepared and published the data to support this kind of analysis, including pure SQL functions to process Semver versions and constraints at scale. This is part of a broader investigation into vulnerability management and update behaviour.

  • Thanks to for supporting this content.