Natural Load Testing

2012-May-07

My friend Paul Reinheimer has put together an excellent product/service that is probably of use to many of you.

The product is called Natural Load Testing, and it harnesses some of the machinery that powers the also-excellent wonderproxy and its extremely useful VPN service.

The gist is that once you've been granted an account (they're in private beta right now, but tell them I sent you, and if you're not a horrible person such as a spammer, scammer, or promoter of online timesuck virtual farming, you'll probably get in—just kidding about that farming clause… sort of), you can record real, practical test suites within the simple confines of your browser, and then you can use those recorded actions to generate huge amounts of test traffic to your application.

In principle, this idea sounds like nothing new—you might already be familiar with Apache Bench, Siege, http_load, or other similar tools—but NLT is fundamentally different from these in several ways.

First, as I already mentioned, NLT allows you to easily record user actions for later playback. This is cool but on its own is not much more than merely convenient. What isn't immediately obvious is that in addition to the requests you're making (HTTP verbs and URLs), NLT is recording other extremely important information about your actions, too: I find HTTP headers and timing particularly interesting.

Next, NLT allows you to use the test recordings in a variable manner. That is, you can replace things like usernames and email addresses (and many other bits of variable content) with system-generated semi-random replacements. This allows you to test things like a full signup process, or semi-anonymous comment posting, all under load.

NLT also keeps track of secondary content that your browser loads when you're recording the test cases. Things like CSS, JavaScript, images, and XHR/Ajax requests are easy to overlook when using less-intelligent tools. NLT records these requests and (optionally) inserts them into test suites along side primary requests.

Tools like Siege and the others I've mentioned are useful when you want to know how many concurrent requests your infrastructure can sustain. This is valuable data, but it is often not really practical. Handling a Slashdotting (or whatever the modern day equivalent of such things is called) is only part of the problem. Wouldn't you really prefer to know how many users can concurrently sign up for your app, or how many page-1-to-page-2 transitions you can handle, without bringing your servers to their knees (or alternatively: before scaling up and provisioning new machines in your cluster)?

Here's a practical example. Since before the first edition of the conference, the Brooklyn Beta site had been running on my personal (read: toy) server. Before launching this year's edition of the site, which included the announcement for Summer Camp, I got a bit nervous about the load. I wasn't so much worried about the rest of my server suffering at the traffic of Brooklyn Beta, but more about the Brooklyn Beta site becoming unavailable due to overloading. This seemed like a good opportunity to give NLT a whirl.

I recorded a really simple test case by firing up NLT's proxy recorder, and visiting each page, in the order and timeframe I expected real users to browse from page to page. Then we unleashed the NLT worker hounds on the pre-release version of the site (same hardware, just not on the main URL), and discovered that it wasn't doing very well under load. I then set up Varnish and put it into the request chain (we were testing mostly dynamically-generayed static content after all—why not cache it?). The results were clear and definitive: Varnish made a huge difference, and NLT showed us exactly how. (We've since moved the Brooklyn Beta site to EC2, along with most of the rest of our infrastructure.)

Future Sean, here. Chart's gone. )-:

This chart shows several response times over 20 seconds with only 100 concurrent requests without Varnish, and most response times less than 20 milliseconds with 500 concurrent requests. Conclusion: we got over a thousand times better performance with five times as many concurrent workers when Varnish was in play.

(Aside: I hope to blog in more detail about Varnish one day, but in the meantime, if you've got content you can cache, you should cache it. Look up how to do so with Varnish.)

If NLT sounds interesting, I encourage you to go watch the demo video and sign up. Then send Paul all kinds of bug reports and feature requests so that he can make it more awesome before he accepts the few dollars you'll be begging him to take in exchange for your use of the service.