Friday, July 6, 2012

Cache Cachet

One of our goals for TheaterMania is to achieve infinite* scalability. I would like to be able to feel deeply confident that we could handle as much load as could possibly be thrown at us, because we have infinite* scalability. Why the asterisk? Because I'm only really looking to scale reads, and only reads of non-personalized data. There are, of course, ways to scale out writes and personalized reads (e.g. for logged-in users) but the nature of the application is that those are much less essential, and besides, it would be an isolated project so let's do first things first.

So then, infinite* scalability: of course, it's about caching. The approach we've decided on is to render complete HTML pages and store them on a CDN. Any personalization can happen via AJAX calls; as long as those calls fail gracefully, the server handling dynamic content can crash, and the core content of the site is still live, being served by the CDN. For a lot of static content, we use Amazon S3 as a sort of cheapo CDN, but it's not really designed to serve massively parallel requests (I'm not sure what would happen if we tried), and it won't request content updates automatically from an origin server. Fortunately, true CDNs abound, and our plan is to leverage one. Next step is to comparison-shop CloudFront, CloudFlare, and ??? (Akamai?). I'm hoping that since our needs are relatively modest -- we don't need ultra-low latency or global edge servers -- we can find one that fits our budget.

Our challenge then will be to make sure we really understand the cache-manipulation API. As Gautam told me, "when you cache complete pages, you have to be sure you have a very reliable cache-busting mechanism." Wise.