Of webservers and datacenters...
dacut at kanga.org
Mon Oct 30 21:33:40 PST 2006
Matthew Dillon wrote:
These days accessing any major web site can take multiples of seconds
due to the complexity of the site, all the separate DNS domains that
the client machine has to lookup to process the page, and backend
latency (servlet startup, etc). [...] And it isn't
because of network bandwidth issues. I think one could introduce upwards
of 20ms of networking latency on the web server side and not notice
Yep. We're heavily into service-oriented architecture (SOA), and we
probably have one of the better implementations out there -- heck, we're
sometimes mentioned in the trade rags as the poster child for SOA. It's
still agonizingly slow, and I believe it's the main bottleneck in our
I view SOA not as a way of organizing software -- it's really a way of
organizing developers. Or, to put it another way, SOA is what you move
to when you can't or won't impose discipline upon your development teams
yet you still need to ship features. The downside is it makes your site
slow (especially as a page starts using more services). In response,
you keep shipping more gee-whiz features so folks (hopefully) won't notice.
What is crazy is that nobody bothers to benchmark this problem.
Actually, we do. We measure almost everything and have bandwidth
dedicated for log pulling and metrics publishing. I'm on a team which
owns a set of services fronting a cluster of core databases, and our
pagers go wild whenever latencies jump up.
Alas, improving performance is not a business priority. It's more about
keeping the status quo as more gee-whiz features pop into existence.
If I have a big web server and I am shoving out
500 MBits of data a second, then my main worry is going to be the cost
of transporting that data over the internet relative to which the cost
of the server is pretty much zip.
For us, external bandwidth is a mostly-fixed cost. (Changing this
requires a large effort -- and, yes, we've done it twice, and both times
it involved devoting a significant part of our engineering effort to do
this.) We did spend a lot of effort minimizing hardware cost recently
-- we were getting crushed by teams whose approach to scaling was to
throw more machines at it instead of rewriting O(n^2) algorithms into
O(n log n). (Actually, my team recently discovered a client who was
hitting us with O(n^2) requests when they could've been doing O(1)...
Part of the problem with hardware is there's an associated cost of
supplying it with electricity (don't forget generators and UPSes during
electric outages), cool air, and datatechs to just build and repair the
More information about the Submit