Writing on the Dunnhumby Engineering Blog

Dunnhumby is a retail data science company that I’ve been working with lately. I’ve enjoyed writing a couple of articles for their Data Science and Engineering blog. The first is a slightly extended version of an article on here, Scala Types in Scio Pipelines. The more recent article is original and talks about the experiences we’ve had putting together streaming demos of real-time streaming data processing solutions. If you’re interested, you can find that article at Building Live Streaming Demos [Read More]

Scala Types in Scio Pipelines

Data pipelines in Apache Beam have a distinctly functional flavour, whichever language you use. That’s because they can be distributed over a cluster of machines, so careful management of state and side-effects is important. Spotify’s Scio is an excellent Scala API for Beam. Scala’s functional ideas help to cut out much of the boilerplate present in the native Java API. Scio makes good use of Scala’s tuple types, in particular pairs (x, y). [Read More]