Monday, August 19, 2013

The next time I launch a website

Over the past decade, I've launched a lot of websites. In that time, it's gotten easier and easier to scale them, especially content sites with emphasis on read queries. But there are many challenges remaining, and until we can use seamless iframes (dammit, when?), scaling page requests will require a mix of technologies to balance breadth (pushing the same content out to lots of people, usually via edge caching on a CDN) versus depth (pushing unique content out to one person, possibly via Ajax personalization of a generic page). Traditionally, we've started by making everything dynamic -- all requests hit the app servers, all responses unique, even if they have similar content. Then we layer on the cache, re-engineering and refactoring as the site grows and we discover performance bottlenecks. This approach works, but it's not optimal, because it's reactive: we wait until we observe the problem before addressing it. Possibly, we wait too long.

But there may be a better way, and it doesn't require premature optimization...

A sub-optimal route

At the AWS conference, I learned that CloudFront (Amazon's CDN) will accelerate page delivery even if nothing is cached. This is due to route optimization: requests hit Amazon's edge servers, located all over the world, and then immediately enter the AWS optimized private network, traversing fewer hops over better pipes on their way to the origin S3/EC2 sources. They don't bounce all around the 'net, trying to find their way to Virginia. (Important: this applies only if the origin is within AWS.) So, I will configure CloudFront with no caching at all -- every request will be passed to the origin for unique resolution. Effectively, there's no CDN at all, but I will get the network benefits. Importantly, though, the caching mechanism is in place from day one. Any engineering that must be done to effectively work within this infrastructure can be incorporated into the first build-out, not delayed until it starts being an issue. As traffic grows, I'll dial up the cache. Maybe just 30 seconds at first. And of course, different kinds of content can be cached for different amounts of time. Plus, individual cookies can be acknowledged or ignored, so personalized requests can be passed to the origin while generic requests are cached, even at the same URI.


Another good practice from day one is to plan for subdomains: as part of scalability, I can expect that I'll want to serve different content from different subdomains -- cdn.mysite.com, api.mysite.com, etc. It's easier to deal with this later if the code already knows about it. At first, these can all resolve to the same place. Importantly for CloudFront, I'll also need a post.mydomain.com URL to receive form POST requests, which CloudFront won't handle (AWS: please implement a feature to forward POSTs to the origin!).

[UPDATE: CloudFront now supports POSTs -- as well as PUTs and other verbs. Whoo! http://aws.amazon.com/about-aws/whats-new/2013/10/15/amazon-cloudfront-now-supports-put-post-and-other-http-methods/]

So, the next time I launch a website, I'm going to put the whole thing behind CloudFront, from day one.