Skip to content

Home

  • Profile photo for Paul Brabban Paul Brabban, Lead Consultant at Equal Experts


    With experience in software development, data engineering and machine learning, I specialise in data-intensive problems and decentralised data engineering at scale. My experience extends to leading teams, technical architecture and product development. Find out more about my experience and publications in my portfolio.


    Contact me to see how I can help at paul@tempered.works.

Avoiding CAST ROW in AWS Athena SQL

The Trino issue that makes row aggregations so much easier to work with

There is more than one way to build a ROW in AWS Athena and the underlying Trino engine. It turns out I was doing it the verbose, brittle and really annoying way. Row subqueries are so much better!

  • Thanks to Equal Experts logo for supporting this content.

Time travelling with change data capture

Visualisation of effects of end timestamps on a series of transactions

In the last article of the series, I ran into difficulties as I tried to "time-travel" back to earlier points in time. That's an important capability for correct functioning and reproducing results. This article shows how to use window functions (also known as analytic functions) to simplify handling of processing time and avoid the previous problems.

  • Thanks to Equal Experts logo for supporting this content.

Map over an array in BigQuery

SQL snippet for map explained in post

This walkthrough shows how I can use the functional programming techniques map and filter that I already know and love in SQL engines like BigQuery. These techniques give me a lot of processing power whilst keeping my SQL simple and relatively easy to understand. Unlike custom code, I can use the same SQL and infratructure I'd use to process ten rows to process ten billion rows in seconds.

Breaking change data capture with primary keys

A SQL statement that updates a row's primary key

My work on dealing with multiple tables was interrupted when I discovered a subtle scenario that leads to DMS CDC output that cannot be correctly interpreted. I was unable to find a solution, but I will update this post if new information emerges.

  • Thanks to Equal Experts logo for supporting this content.

Disambiguating transactions in change data capture

Example query against disambiguated view

In the CDC output, I get a row for each statement executing in the transaction. Each row reflects the state of the database when that statement is executed. How do I filter out all the transient statements to get the final state of the row when a transaction has finished?

  • Thanks to Equal Experts logo for supporting this content.

Handling CVE-2019-8341 for dbt and mkdocs

Safety output for CVE-2019-8341

Yesterday, Safety told me about CVE-2019-8341, a security issue affecting Jinja2. I'll walk through how I investigated and assessed the risk to my website and a dbt pipeline I operate in the public domain. I finish up with a commentary on why I think this vulnerability is real and should be fixed, and why I think we need to risk breaking potentially insecure usage to make vulnerability management manageable in the real world.

  • Thanks to Equal Experts logo for supporting this content.