Materialized UDFs in a dbt World

As part of my work on the PyPI downloads dataset, I needed a way of matching package versions to vulnerability report ranges. I didn’t find a solution I trusted, so I implemented a solution from spec with decent test coverage and CI/CD automation in user defined functions (UDFs). This post covers a novel approach to incorporate UDFs into the dbt ecosystem that is working really well for me - treating UDFs as dbt models with custom materialization.

Thanks to Equal Experts for supporting this content.

Equal Experts logo [Read More]
dbt  bigquery  udf 

The BigQuery Safety Net

Last time, I said:

[BigQuery] doesn’t offer a “don’t bankrupt me without asking first” setting.

After further work, I find that’s not true! This setting is available in the UI, just a bit tricky to find. More importantly, there’s another set of controls elsewhere that you need to know about if you want to use BigQuery safely.

Thanks to Equal Experts for supporting this content.

Equal Experts logo [Read More]

$1,370 Gone in Sixty Seconds

Given a historical view of the Safety DB, I need to get a historical view of PyPI downloads. The source dataset contains 274TB at time of writing, and grows every day - one naive query against that table could cost $1,370. A query over a single day can scan hundreds of GB, so I need to do some work to make it usable.

Thanks to Equal Experts for supporting this content.

Equal Experts logo [Read More]

Analysing PyPI Downloads

Investigating Python package downloads with the public PyPI downloads dataset and Safety public database. This post covers how I’ve prepared and published the data to support this kind of analysis, including pure SQL functions to process Semver versions and constraints at scale. This is part of a broader investigation into vulnerability management and update behaviour. Thanks to Equal Experts for supporting this content.

Equal Experts logo [Read More]

Consumer-Driven Contracts for SQL Data Products

dbt announced “model contracts” in the recent v1.5 release. This looks like a great feature for dbt, but reminded me that I’ve been using contract testing with dbt for a couple of years now, inspired by Pact consumer-driven contracts, but never talked about it. There are some differences, for example: dbt’s new feature is very dbt-centric, the approach I’ve used isn’t - dbt certainly helps, but it isn’t necessary. There’s a GitHub repo to follow along with.

[Read More]