Skip to content

Home

Consumer-Driven Contracts for SQL Data Products

dbt announced "model contracts" in the recent v1.5 release. This looks like a great feature for dbt, but reminded me that I've been using contract testing with dbt for a couple of years now, inspired by Pact consumer-driven contracts, but never talked about it. There are some differences, for example: dbt's new feature is very dbt-centric, the approach I've used isn't - dbt certainly helps, but it isn't necessary. There's a GitHub repo to follow along with.

Bashing Alpine

A bear, lying on its side facepalming
Photo by https://unsplash.com/@bradleyhowington

For those times when a script is both missing and exactly where it should be.

So this annoying and trivial little problem catches me out every so often. I am always misled by the error message! You'll see what I mean shortly. For context, it usually happens when I'm working in Docker containers on a build.

Helm Charts for Argo Workflows

Argo is a lightweight, Kubernetes-native workflow solution. Workflows are implemented as Kubernetes manifests, so Helm is a natural choice for packaging them.

Helm also supports templating values which can be really helpful - but that's where we run into a problem. Helm uses mustache-style string interpolation, and so does Argo.

Packer & Fedora Gotchas

Lately, I've been working in virtual machines to strengthen my security posture for clients. There's more to come on that, but for now I wanted to share a fix for a confusing problem I had. I was trying to install Fedora 31 as a Packer build.

Performance with Spring Boot and Gatling (Part 2)

In Part 1, we built a simple Spring Boot webapp and demonstrated a surprising performance problem. A Gatling performance test simulating different numbers of users each making a single request showed our webapp unable to keep up with 40 "users" making one request per second on my fairly powerful computer.

We eliminate a couple of potential causes in the first part of the article. If you just want to know what was causing the problem, you can go straight there.

We've already eliminate many potential culprits, so we continue using a process of elimination to figure out what's causing the problem. I shared a link to the first part and invited people to guess what the problem was.

Thread starvation was amongst the guesses, so let's take a look.

How Many Threads?

Spring Boot apps run in an embedded Tomcat server by default, so this is something we can look into. If we were, say, talking to a slow database synchronously then it would seem more likely that we might be running out of threads to service new requests as exising requests keep threads waiting for responses from the database.

As it's easy, we'll see what happens if we increase the number of threads Tomcat can use. We can check the documentation to see that the default thread pool size is 200 threads and how to increase it. We can increase that by an order of magnitude to 2000 threads in application.properties:

server.tomcat.max-threads=2000

Running our Gatling test shows that this change has adversely affected performance. We see more active users at peak and a funny dip in the actives line as the backlog is cleared. That dip, marked on the chart below, is a few requests timing out.

Requests per second and active users

That didn't help, so let's put the max thread count back to the default before we try anything else! If you're following along, I recommend using clean, as in mvn clean spring-boot:run to ensure that nothing is remembered from the previous build. You can lose a lot of time acting on misleading results otherwise!

JSON Encoding

Another unlikely candidate is the JSON encoding we're doing on our responses. Out of the box, Spring Boot uses the venerable Jackson library, but you never know, the default settings might be really inefficient. Again, it's really easy to check so let's find out. We update the controller so that instead of returning Java object containing a String, it returns just a plain string, so from:

    class Greeting {
        public String getGreeting() {
            return "Greetings from Spring Boot!";
        }
    }

    @RequestMapping("/")
    public Greeting index() {
        return new Greeting();
    }
to:

    @RequestMapping("/")
    public String index() {
        return "Greetings from Spring Boot!";
    }

If you make that change and run the Gatling test, you see...

...drum roll...

No detectable difference from the original results we got in part 1. Not really a surprise, JSON encoding isn't the problem.

Spring Security

The next thing we can check easily is whether Spring Security is causing the problem somehow. Like the cases above, we've got well tried and tested software running with Spring Boot's sensible defaults, so yet again, it seems unlikely. Something's causing the poor performance and Spring Security does lots of things though, like authentication and session management. We're running out of possible causes though so let's give it a try.

The quick and easy way to check whether something that Spring Security is autoconfiguring in is causing the problem is to just omit the dependency. Let's delete spring-security-web and spring-security-config from our pom.xml. We can see there's less happening on startup, but will it handle the load better? Let's run the test.

Asciicast with no spring security on the classpath

We have a winner! You can see that the performance is improved. No active requests throughout the test - the app is keeping up easily. The charts tell the same story. Let's compare side by side to get a feel for the difference.

Response time distribution, before:

Histogram of response time distributions, with Spring Security

Response time distribution, after:

Histogram of response time distributions, without Spring Security

Instead of a spread of response times all the way up to 30 seconds, we have all responses served within a few milliseconds. Much better! On to requests per second.

Requests per Second and Active Users, before:

Plots of requests per second and active users, with Spring Security

Requests per Second and Active Users, after:

Plots of requests per second and active users, without Spring Security

To appreciate the difference, note the different scale for Active Users, the y-axis on the right, and the total time the test ran for on the x-axis. Before, the number of active users climbed faster than the request rate, indicating that requests would start timing out if the test hadn't ended. After we remove Spring Security, the number of active users is the same as the request rate throughout the test, so the app is keeping up perfectly with the load.

What's causing the problem?

Spring Security with Spring Boot automatically applies sensible security settings to an app when it's on the classpath. You take control of aspects of configuration through annotations and extending classes. As this is such a simple app, there's not too much to check. We'll take control of authentication and roughly replicate the automatic configuration with this @Configuration annotated class

@Configuration
public class SecurityConfig extends WebSecurityConfigurerAdapter {

    @Override
    public void configure(HttpSecurity http) throws Exception {
        http.httpBasic();
    }

}

The Gatling test shows that the performance problem is back. Delete the http.httpBasic() line and the problem goes away. Something to do with authentication then. I didn't find anything in Spring Security or Spring Boot documentation to explain it.

I'm not sure how you'd figure it out if you didn't know where to start. GIven a little experience with passwords and authentication you can join the remaining dots. There's two things going on that could be responsible. Let's look at password encoding.

Password Encoding

We protect passwords for by 'encoding' or 'hashing' them before we store them. When the user authenticates, we encode the password they gave us and compare with our stored hash to see if the password was right.

The choice of encoding algorithm is important in this world of cloud computing, GPUs and hardware acceleration. We need an algorithm that needs a lot of CPU power to encode. We get a password encoder using the bcrypt algorithm by default, an algorithm that's been designed to withstand modern techniques and compute power. You can read more about bcrypt and how it helps keep your user database secure in Auth0's article and Jeff Attwood's post on Coding Horror.

See the connection yet? The choice of Bcrypt makes sense for protecting the credentials we're entrusted with, but do we do about this terrible performance?

Sessions

By default, the security config responds to our first authentication with a cookie containing a session ID. That is exchanged without any encoding. Sessions come with lots of problems of their own, so we'll leave that one for another day. If we'd been using a browser, or Gatling had been set up to make lots of requests as the same user, we'd have used the session ID and not seen a performance problem.

Reconfiguring our App

We'll override the password encoder to prove that it is causing the performance problem. As we're dealing with a hardcoded test password rather than real user passwords, we don't need to worry about it not being secured. We update our SecurityConfig class like this:

@Configuration
public class SecurityConfig extends WebSecurityConfigurerAdapter {

    // make the credentials in application.properties available
    @Value("${spring.security.user.name}")
    private String username;

    @Value("${spring.security.user.password}")
    private String password;

    @Override
    protected void configure(AuthenticationManagerBuilder auth) throws Exception {

        // choose a more efficient (and far weaker) hashing algorithm
        final PasswordEncoder sha256 = new StandardPasswordEncoder();

        auth.inMemoryAuthentication()
                .passwordEncoder(sha256)
                .withUser(username)
                .password(sha256.encode(password))
                .roles("USER");
    }

    @Override
    public void configure(HttpSecurity http) throws Exception {
        http.httpBasic();
    }

}
When we run our performance test one last time, we see that we have performance and we have authentication. @glenathan takes the prize for correctly guessing the cause!

How fast can it go? When I push the request rate higher, I see that this app can actually handle around 2,000 requests per second. I won't bore you with more asciinema or the charts, but you can play with the updated app yourself if you want.

Performance with Spring Boot and Gatling (Part 1)

Just after the rest of the team had left for their Christmas holidays, my colleague and I discovered a weird performance problem with a Spring Boot application we'd just started writing. This is a the story of discovering the problem and the detective work that led us to the culprit hiding in plain sight. We're going to recreate the app and the performance tests, but first I'll tell you how we got here.

Prologue

Twenty requests per second. Any more than that and response times would climb. Eventually, the load balancer's readiness checks would timeout and it'd refuse to send traffic to the app, taking the service offline.

Here's what our little prototype system looked like:

Diagram showing the components parts of the prototype when we discovered the performance problem

We're building an API, rather than a website. Clients authenticate by passing a username and password in the request. Each request is a query, and the app talks to a graph database to calculate an answer, returning it as a JSON document. It's running in a Kubernetes cluster on AWS, behind an Elastic Load Balancer.

How can it be struggling to serve more than twenty requests per second? I've not used this graph database before, can it really be that slow? Nor have I used Kubernetes or Spring Boot, are they responsible? You wouldn't think there'd be enough of our code yet to perform so poorly, but our own code is always the go-to suspect.

Too Many Suspects

There's too many potential culprits here, so let's eliminate some. Can I reproduce the problem here on my machine? Yes - and I get a clue. As the test runs, I can hear the fan spinning up. Checking back on AWS for server metrics, the CPU utilisation was shooting up to 100% during the test. That removes Kubernetes, the load balancer, the network and disks from the investigation, at least for now. Memory could still be a problem, as Java's garbage collection chews up compute time when there's not enough memory.

Now we eliminate the database. We'd written a resource to return version information, which is just returning a document from memory. Running the performance test on that endpoint revealed the same terrible performance! Something to do with the Spring framework, or the Tomcat application server then - where can we go from here?

We could pull out profiler tooling to look inside the running app and see what's going on. It's been a while since I used that tooling on the JVM, and it'll produce a lot information to interpret, so I'll leave that as a backup plan. For now, we've got an easy option that will rule out the Spring Boot framework and Tomcat application server. A "getting started" Spring Boot app won't take long to set up. We can eliminate JSON processing, configuration problems and coding errors as potential candidates, and get a benchmark for how performant the simplest Spring Boot app is with our hardware and test setup.

This is where we write some code.

The "Getting Started" App

You can find and clone the project we're talking about in this post on Github at https://github.com/brabster/performance-with-spring-boot/tree/1.0. You'll need a JDK and Maven installed to compile and run the application.

I based the "getting started" app closely on Spring Boot's documentation. It's got one endpoint at / and returns a JSON document {"greeting": "Greetings from Spring Boot!"} like this:

package hello;

import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.bind.annotation.RequestMapping;

@RestController
public class HelloController {

    class Greeting {
        public String getGreeting() {
            return "Greetings from Spring Boot!";
        }
    }

    @RequestMapping("/")
    public Greeting index() {
        return new Greeting();
    }

}

We'll use Spring Security to authenticate API clients, so we need to add the dependencies and set a default username and password. The dependencies we need to add to our Maven pom.xml file are:

<dependency>
    <groupId>org.springframework.security</groupId>
    <artifactId>spring-security-web</artifactId>
    <version>5.1.2.RELEASE</version>
</dependency>
<dependency>
    <groupId>org.springframework.security</groupId>
    <artifactId>spring-security-config</artifactId>
    <version>5.1.2.RELEASE</version>
</dependency>

To set a default username and password, we add a properties file containing two properties that override Spring Security's defaults:

spring.security.user.name=user
spring.security.user.password=24gh39ugh0

Start the app with mvn spring-boot:run and you should see something like this:

Asciinema recording of the app starting

Performance Testing with Gatling

Gatling is the tooling that gave us those original requests per second figures, so let's reproduce the setup to do our performance tests here. Gatling tests are written in Scala and can coexist with the Java code, but we need a little support in our project to run tests and get editor support for Scala.

To compile Scala code and enable Scala support (at least in IntelliJ IDEA) I used the rather neat scala-maven-plugin:

<plugin>
    <groupId>net.alchim31.maven</groupId>
    <artifactId>scala-maven-plugin</artifactId>
    <version>3.4.4</version>
    <executions>
        <execution>
            <id>scala-test-compile</id>
            <phase>process-test-resources</phase>
            <goals>
                <goal>testCompile</goal>
            </goals>
        </execution>
    </executions>
</plugin>

To run Gatling from Maven and view its output, we need a dependency and a plugin:

<dependency>
    <groupId>io.gatling.highcharts</groupId>
    <artifactId>gatling-charts-highcharts</artifactId>
    <scope>test</scope>
    <version>3.0.2</version>
</dependency>
<plugin>
    <groupId>io.gatling</groupId>
    <artifactId>gatling-maven-plugin</artifactId>
    <version>3.0.1</version>
</plugin>

Now we can write a Gatling test. This is our scenario, describing the client behaviour we want to test.

setUp(myScenario.inject(
    incrementUsersPerSec(20)
      .times(5)
      .eachLevelLasting(5 seconds)
      .startingFrom(20)
  )).protocols(httpProtocol)
    .assertions(global.successfulRequests.percent.is(100))

We're starting with 20 users per second making a request to the / resource, holding at that concurrency for five seconds. They only make one request. Then we increase the number of users per second by twenty, five times, holding for five seconds each time. Every request must return an HTTP 200 status code to pass the test. Simple! You'll find the rest of the test in LoadTest.scala.

Make sure the app is running and then run the test with mvn gatling:test.

Asciinema recording of the performance test running

When the tests run you see a progress bar being refreshed every few seconds. The ### part represents the proportion of requests that have been made and completed. The section with dashes --- is requests made but not yet completed. The numbers are just below the progress bar, active telling us how many requests have been made but not yet completed. There's a lot of those, over a thousand towards the end of the test, and this computer isn't exactly underpowered. There's our performance problem! Towards the end of the test, requests are taking over 26 seconds to complete.

If you cloned the project, you can try changing the scenario in LoadTest.scala to explore the problem. Running something like top will show you your live CPU utilisation. I can see the app using almost a full 4 cores while the test is running. To serve a short text string from memory to less than 100 users per second!

Gatling's Reports

Gatling saves a report of metrics and charts for each test. There's a couple that I think give us interesting insight into what just happened that we might not have seen as the test was running.

Bar chart showing 15 requests responded in around 200 milliseconds, with other requests uniformly distributed up to almost 30 seconds

The "Response Time Distribution" report tells us that the fastest few requests are served in around 200ms. So it takes at least 200ms to serve a request! Then there's an roughly uniform distribution of request times up to 30 seconds. The test only ran for around 70 seconds in total.

Next, the "Number of requests per second" chart shows more clearly that the app isn't keeping up, even with these low request rates. The number of active users (those that have made a request and not yet had a response) climbs until Gatling stops sending new requests.

Line chart showing the number of new requests per second and the number of active requests over time.

You can see the app is not quite able to keep up at 40 requests per second. as we ramp to 60 the line swings upwards as it really starts to fall behind. 45 seconds or so into the test the number of requests per second drops from 100 to zero, and the number of active requests, just over 1000 by this point, stops climbing and starts to fall as the app starts to clear its backlog.

Gatling's reports show you plenty of other interesting charts and figures. Find them in your target directory after running a test.

Next Time

That's the context, the tools and a simple codebase to get us started. In Part 2, we see how a performance problem shows itself and figure out how to resolve it.

Writing on the Dunnhumby Engineering Blog

Dunnhumby is a retail data science company that I've been working with lately. I've enjoyed writing a couple of articles for their Data Science and Engineering blog.

The first is a slightly extended version of an article on here, Scala Types in Scio Pipelines.

The more recent article is original and talks about the experiences we've had putting together streaming demos of real-time streaming data processing solutions. If you're interested, you can find that article at Building Live Streaming Demos

It was also an opportunity to try medium.com out as a technical author. I found the difficulty in inserting code snippets (you embed Codepen or Github snippets, but it's a bit inconvenient compared to just writing the code) and the lack of version control to be the main downsides. The polish and support for writing for a third-party publication were upsides.

Thanks to Dunnhumby for the opportunity to write on their blog!

Setting up this site with GatsbyJS and Netlify

No better time to grab an old Geocities-style under construction gif...

Every company needs a website, and Tempered Works is no exception! Having bought the domain names when I set the company up, I've been putting off getting a website up and running because I'm not really a front-end creative type. When I heard Jason Lengstorf talking to The Changelog about GatsbyJS, I was intrigued... so I tried it out.

Why GatsbyJS?

I think GatsbyJS is interesting when compared to other static site generators because it's based on GraphQL and React. I've never worked with React, so there's an opportunity to learn about that, but I think the GraphQL part is most interesting. The idea is that you can generate content on your site based on queries to other datasources. The queries are done at build time, so you still get a static site, with the associated benefits. Benefits like fewer security considerations (although there are still some - we'll get back to that), many options for cheap or free hosting, great reliability and the potential for super-fast page load times.

Where to Start?

Gatsby provides loads of "starters", projects you can use as a basis for your own. A quick look down the list and I settled for gatsby-starter-lumen. I felt it had a clean, professional look, and it seemed really quick on page loads. A quick gatsby new my-blog https://github.com/alxshelepenok/gatsby-starter-lumen later, and I had a basic project. If you're trying it out for yourself, check out the Gatsby docs to fill in the details I leave out.

I'm not sure whether I'll stick with the theme. Aside from the clean styling, it's the blog aspect and markdown support for posts that I like. After adding a couple of links and a company footer to the sidebar, the mobile view is mostly links and footer! It also feels unnecessarily narrow on my laptop, so code snippets are particular hard to use without scrollbars. We'll see, hopefully it wouldn't be too difficult to switch if I decided to.

Where does GraphQL fit?

After creating a dummy blog post and gatsby develop-ing my site up on localhost:8000, I decided to add social links for linkedin and stackoverflow. Each component and each page looks up the data it needs with a GraphQL query. Where were the social links coming from?

The social links appear in the sidebar on every page, so the details are kept in the gatsby-config.js file, under siteMetadata > author. This config file is available to query, and each page does exactly that. For example, the index page uses this query. These pieces of data are then rendered in the Links component, which is used in the Sidebar component, which is itself used in almost every page.

So - to add these links, I need to: - add the details for my new social links to gatsby-config.js, - update the queries to fetch those new links, - update the Links component to render the new links.

Unfortunately, I need to update the query to include the new links on every page that uses the sidebar! That got tedious fast, but Gatsby and GraphQL have a solution - fragments. After defining a query fragment to fetch the author details, I swapped the fragment into every query that used the author details. Adding or removing author details can now be done in one place. Gatsby's GraphQL document is a must-read!

Why Host when you can Netlify!

Netlify was the obvious choice to host this site. It's free for a simple, single-user site like this and it knows how to deploy a Gatsby site. All I had to do was authorise access to my Github account, select the repository I wanted to deploy and wait a few seconds for the site served on a https:// URL with a randomly generated host to build and deploy. That leads us neatly to security and performance!

What About Security?

Even though this is a static site, there are still ways it could be abused. We don't have the traditional backend attack vectors because we don't have a server or a database. Bad actors could still get creative with JavaScript, iframes, and so on to compromise your computer or influence what you're seeing on this site. I used Mozilla's Observatory to scan the site that Netlify launched for me, and it got a D+ rating. Could be worse, I guess, but that's not good enough!

It's possible to influence the headers that Netlify serves. To keep things tidy, there's gatsby-plugin-netlify, a Gatsby plugin that can make the header configuration part of your Gatsby configuration. I started by adding the headers that Observatory recommended, to get an A+ rating. Then I relaxed the rules until the site worked again!

I like that approach, particularly when I'm using an open source project like Gatsby and the Lumen theme, because you essentially get a guided tour of what the site is doing that has security implications. I also caught a mistake because of these headers. I'd left a Giphy link to an image instead of using the site's local copy. The CSP headers disallowed it because they only allow images to be served from 'self' and Google Analytics.

It took about 10 commits before I was happy-ish with the headers and the site was working without any errors in the JavaScript console. The site gets a B+ right now, with the remaining issues being Content Security Policy specifications that are a little more lenient than we'd ideally like. It looks like the Gatsby team is working on dealing with those remaining issues.

The CSP headers I ended up with were quite verbose, and Gatsby's config file is JS, so I added a bit of code to make things a little more maintainable.

const cspDirectives = [
  "default-src 'self'",
  "script-src 'self' 'unsafe-inline' https://www.google-analytics.com",
  "font-src 'self' https://fonts.googleapis.com https://fonts.gstatic.com",
  "style-src 'self' 'unsafe-inline' https://fonts.googleapis.com",
  "img-src 'self' https://www.google-analytics.com"
];

const directivesToCspHeader = headers => headers.join(';');

I can now use these in the config like this:

{
  resolve: 'gatsby-plugin-netlify',
  options: {
    headers: {
      '/*': [
        'X-Frame-Options: DENY',
        'X-XSS-Protection: 1; mode=block',
        'X-Content-Type-Options: nosniff',
        `Content-Security-Policy: ${directivesToCspHeader(cspDirectives)}`,
        'Referrer-Policy: no-referrer-when-downgrade'
      ]
    }
  }
}

Here's the observatory's advice on those headers.

Mozilla's Observatory, showing the summary for the website

What about Performance?

A similar approach to benchmark performance, using Google's Page Speed tool. Right now, we're getting 71% on the mobile optimisation benchmark, and 90% on the desktop benchmark. Whilst the site feels very snappy to me, there's probably work to do there when I have time, but at least I have a measurement to start from.

Google's Page Speed tool, showing the poor mobile performance for the website

Monitoring

The last thing to touch on is the boring operations stuff. How will I know if the site goes down or goes slow, particularly as I don't have any servers to alert me? My go-to tool for this kind of thing was Pingdom, but it looks like they've done away with their free tier. If I recall correctly, it used to be free to healthcheck two URls. Now you get a 14 day trial.

We can't really complain when previously free services change their terms, but before signing up I checked whether anyone else was doing this basic health checking, and I found UptimeRobot. They have a generous free tier, so I signed up there instead and pointed them at the test site. It's been checking for three hours now and everything looks good. I can also see that the response times are between 150-250ms, which is a useful measure to have historical data on!

Uptime Robot's dashboard for availability and latency history, showing 100% availability and latency between 150-250ms

Finally... DNS and TLS Setup

The last thing to do is migrate the DNS records over to Netlify, so that https://tempered.works points to the Netlify site! I bought the domain though Hover after recommendations by Steve Gibson on the Security Now! podcast. Hover is fine, but they don't support CNAME flattening, ANAME or ALIAS records that are required by Netlify to get the full benefits of an apex domain. tempered.works is an apex domain, www.tempered.works would be a non-apex alternative. I want tempered.works to be my domain!. I could move my DNS to Netlify but I'm trying just pointing A records to Netlify's load balancer for now. You may want to choose a DNS provider that supports those newer record types if you intend to host on cloud services!

Of course, now I'm using my own domain name I need a TLS certificate that matches. Netlify's got me covered - it automatically provisioned me a free Let's Encrypt! certificate for my domain. It took over half an hour, but that's no problem. Once the certificate was provisioned, I got the option of forcing connections to https://, so I turned it on. Why would you want to access this site over plaintext anyway?

That's it - tempered.works is online!

Credits

  • Under construction gif courtesy of https://giphy.com/stickers/please-construction-patient-JIejyxfnKRVv2