Skip to content

Home

  • Profile photo for Paul Brabban Paul Brabban, Lead Consultant at Equal Experts


    With experience in software development, data engineering and machine learning, I specialise in data-intensive problems and decentralised data engineering at scale. My experience extends to leading teams, technical architecture and product development. Find out more about my experience and publications in my portfolio.


    Contact me to see how I can help at paul@tempered.works.

Breaking change data capture with primary keys

A SQL statement that updates a row's primary key

My work on dealing with multiple tables was interrupted when I discovered a subtle scenario that leads to DMS CDC output that cannot be correctly interpreted. I was unable to find a solution, but I will update this post if new information emerges.

  • Thanks to Equal Experts logo for supporting this content.

Disambiguating transactions in change data capture

Example query against disambiguated view

In the CDC output, I get a row for each statement executing in the transaction. Each row reflects the state of the database when that statement is executed. How do I filter out all the transient statements to get the final state of the row when a transaction has finished?

  • Thanks to Equal Experts logo for supporting this content.

Handling CVE-2019-8341 for dbt and mkdocs

Safety output for CVE-2019-8341

Yesterday, Safety told me about CVE-2019-8341, a security issue affecting Jinja2. I'll walk through how I investigated and assessed the risk to my website and a dbt pipeline I operate in the public domain. I finish up with a commentary on why I think this vulnerability is real and should be fixed, and why I think we need to risk breaking potentially insecure usage to make vulnerability management manageable in the real world.

  • Thanks to Equal Experts logo for supporting this content.

Exploring transactions in change data capture

Screenshot of a transaction containing insert, update and delete

Last time, I set up a CDC system using AWS RDS and DMS services. Now, I'll run some operations through the source database and show what that looks like in the CDC output. I'll introduce some metadata fields that are critical to figuring out what this CDC output means and set us up to look at the specific challenges I've had with interpreting these CDC outputs robustly to solve real-world problems reliably.

  • Thanks to Equal Experts logo for supporting this content.

Change data capture with AWS DMS

hero image

Setting up Change Data Capture from Aurora Serverless PostgreSQL to S3 via the AWS DMS service. I'll walk through the demo setup, using the venerable Northwind dataset, calling out the problems and solutions on the way. The next post in this series will show the challenges we hit trying to work with this kind of CDC data and how we dealt with them.

  • Thanks to Equal Experts logo for supporting this content.

Handling CVE-2018-20225

GitHub Actions update and safety run logs ignore and succeeds

CVE-2018-20225 in all versions of pip tripped my vulnerability alerting this morning. If you're scanning for vulnerabilities using Safety, you've probably seen the same alarm. This post captures my reasoning and decision-making process to understand the risk and impact of this vulnerability and then deal with it.

  • Thanks to Equal Experts logo for supporting this content.

dbt 1.8 breaks on update

Error traceback ending with No module named dbt.adapters.factory

On updating dbt-bigquery to latest 1.8.0: No module named 'dbt.adapters.factory'. TL;DR - pip install --force-reinstall dbt-adapters following the broken upgrade should resolve the problem. Delete the venv and reinstall from scratch if not. See my comment in dbt core issue 10135 for an explanation of the cause and why this solution works.

  • Thanks to Equal Experts logo for supporting this content.

How I do Python data supply chain security

A photo taken whilst SCUBA diving of Thresher shark circling off a seamount in the Phillipines. Credit: me

We data practitioners - data scientists, data engineers, analytics engineers, et al. - have a hard time when it comes to security. We're exposed to tools that demand we write code and deal with the messy world of programming languages and packages. We often have little choice but to drag insights out of real and sensitive data, exposing us to risks other developers can avoid, because insights don't hide in test data. Training, career paths and dev-experience efforts typically overlook data folks, depriving them of knowledge about the risks they're exposed to and how to mitigate them. Read on and I'll share what I do (and why) to protect myself, Equal Experts and my clients from the security risks lurking behind every piece of software.

  • Thanks to Equal Experts logo for supporting this content.

Why Try Codespaces?

A photo of a crab on a night dive in the red sea. Credit: me

Why I've been trialling GitHub Codespaces as a more secure alternative to local development. I never expected to be pushing changes from my phone!

  • Thanks to Equal Experts logo for supporting this content.

Fine-Grained GitHub Access Tokens with mkdocs-material-insiders

Aiming an arrow at a target as a hero image

mkdocs-material-insiders is the version of mkdocs-material with extra sponsor-only features. I wanted to use some of those features, but I didn't like the risk of GitHub classic personal access tokens. I'll describe how fine-grained access tokens, currently in beta, mitigate the risk and how I set up to use them for local development and in CI. The solution works, because that's how I wrote and published this post!