Wednesday, August 1, 2012

Share the load

For load-testing OvationTix, we've tried a few approaches over the years. The first time around, we used HP LoadRunner, which is an enterprise-level tool with a price to match. It was pretty easy to use, and we got the data we needed, but it was too expensive to become a part of our ongoing development process. Ideally, we'll load-test every release before deploying, and I don't want cost concerns to intimidate us into holding back from deploying good code when it's ready to go.

So we moved to jmeter, running in Amazon EC2 cloud instances, which of course was cheaper. I set up some (admittedly clunky) Windows instances -- a controller and some generators -- and went to work. Again, we got the data we needed, but now the workflow was cumbersome. We had to launch the generators, hope they booted correctly, figure out their IPs, copy those back to the controller, then fire up the scripts, and then we had problems with the test data saturating the connection between the generators and the controller. It was fair, but not great.

For this year, we made it our goal to have a smoothly automated system -- still based around jmeter, which we like. First, we tried BlazeMeter. It's a jmeter PaaS, which is a really cool idea and promises to take care of the infrastructure so we could focus on writing the tests. It's not bad at all, and I think we may use it in the future, but for now, the costs were higher than we wanted, there were too many limitations on usage (the price tiers control things like ramp-up time, max users etc.), and the reporting wasn't as transparent as we wanted.

Finally, we found jmeter-ec2, which is a wrapper around Amazon's API that automates launching linux micro instances, deploying resources to those instances, firing up the test, and aggregating results. It's a lightweight script that runs in a shell and eliminates the need for a dedicated controller -- instead, each generator controls its own virtual users, and the condensed results are sent back to the shell, which makes for much less traffic between the instances (therefore, no saturation). The data collected isn't as deep as with the other approaches, but for our purposes, that's okay. We're mostly interested in simply finding out how many users we can throw at the site before it crashes. Since our plan is to take over the world, our target for concurrent users is currently 7,057,131,972. Wish us luck.

No comments:

Post a Comment