$1,370 Gone in Sixty Seconds

Given a historical view of the Safety DB, I need to get a historical view of PyPI downloads. The source dataset contains 274TB at time of writing, and grows every day - one naive query against that table could cost $1,370. A query over a single day can scan hundreds of GB, so I need to do some work to make it usable.

Thanks to Equal Experts for supporting this content.

Equal Experts logo [Read More]

Analysing PyPI Downloads

Investigating Python package downloads with the public PyPI downloads dataset and Safety public database. This post covers how I’ve prepared and published the data to support this kind of analysis, including pure SQL functions to process Semver versions and constraints at scale. This is part of a broader investigation into vulnerability management and update behaviour. Thanks to Equal Experts for supporting this content.

Equal Experts logo [Read More]