Profile photo Paul Brabban, Lead Consultant at Equal Experts

With experience in software developnent, data engineering and machine learning, I specialise in data-intensive problems and decentralised data engineering at scale. My experience extends to leading teams, technical architecture and product development. Find out more about my experience and publications in my portfolio.

Contact me to see how I can help at paul@tempered.works.

Why I Automated My Laptop Build

I’ve invested a fair bit of time over the last few years incrementally automating my laptop build. Now, I’ve got to a point where I can reliably wipe, rebuild, and pick up working where I left off in under thirty minutes. This post explains why I’ve invested that time.

Thanks to Equal Experts for supporting this content.

Equal Experts logo [Read More]

Materialized UDFs in a dbt World

As part of my work on the PyPI downloads dataset, I needed a way of matching package versions to vulnerability report ranges. I didn’t find a solution I trusted, so I implemented a solution from spec with decent test coverage and CI/CD automation in user defined functions (UDFs). This post covers a novel approach to incorporate UDFs into the dbt ecosystem that is working really well for me - treating UDFs as dbt models with custom materialization.

Thanks to Equal Experts for supporting this content.

Equal Experts logo [Read More]
dbt  bigquery  udf 

The BigQuery Safety Net

Last time, I said:

[BigQuery] doesn’t offer a “don’t bankrupt me without asking first” setting.

After further work, I find that’s not true! This setting is available in the UI, just a bit tricky to find. More importantly, there’s another set of controls elsewhere that you need to know about if you want to use BigQuery safely.

Thanks to Equal Experts for supporting this content.

Equal Experts logo [Read More]

$1,370 Gone in Sixty Seconds

Given a historical view of the Safety DB, I need to get a historical view of PyPI downloads. The source dataset contains 274TB at time of writing, and grows every day - one naive query against that table could cost $1,370. A query over a single day can scan hundreds of GB, so I need to do some work to make it usable.

Thanks to Equal Experts for supporting this content.

Equal Experts logo [Read More]

Analysing PyPI Downloads

Investigating Python package downloads with the public PyPI downloads dataset and Safety public database. This post covers how I’ve prepared and published the data to support this kind of analysis, including pure SQL functions to process Semver versions and constraints at scale. This is part of a broader investigation into vulnerability management and update behaviour. Thanks to Equal Experts for supporting this content.

Equal Experts logo [Read More]

Consumer-Driven Contracts for SQL Data Products

dbt announced “model contracts” in the recent v1.5 release. This looks like a great feature for dbt, but reminded me that I’ve been using contract testing with dbt for a couple of years now, inspired by Pact consumer-driven contracts, but never talked about it. There are some differences, for example: dbt’s new feature is very dbt-centric, the approach I’ve used isn’t - dbt certainly helps, but it isn’t necessary. There’s a GitHub repo to follow along with.

[Read More]

Checking your Dependencies

Following high-profile incidents like the 2017 Equifax Breach, checking your dependencies for vulnerabilities is a common practice today. We can use great tools like OWASP Dependency Check, Trivy and Snyk in our builds to raise the alarm when vulnerabilities are found.

The question that I find comes up isn’t whether we should check dependencies - but when?

[Read More]

Bashing Alpine

So this annoying and trivial little problem catches me out every so often. I am always misled by the error message! You’ll see what I mean shortly. For context, it usually happens when I’m working in Docker containers on a build.

[Read More]