SCALE5x: Talk summary of Admin++, what root never told you

So I'm at SCALE5x, listening to Ron Gorodetzky talk about what he learned about sysadmining for Digg and Revision3 (who try to be an "Internet television network"; in effect, they distribute loads of big files). Most of the tools he mentioned I already knew, but it was nice to get independent reviews of "hey I think this is good". Here's what I took home from his talk:

  • He really thinks highly of the OSCon 2005 talk Livejournal's Backend (A history of scaling) (PDF).

  • He liked memcached and MogileFS.

  • Between the lines I understood Revision3 has outsourced their big bandwidth use -- the CDNs he mentioned by name were Cachefly (the color scheme hurts even my eyes and real designers think I'm colorblind), BitGravity (caution hideous flash site) and of course Akamai.

  • He spoke about outsourcing data center operations, using things like Amazon EC2 and S3. I need to come up with a budget and time to play with EC2.

  • He stressed the importance of setting up KVMs etc properly for the data center.

  • Set up your infrastructure and plan for scaling before you get popular, because you will be too busy to do them afterwards. That's nice, I like building things scalable from scratch.

  • Specific infrastructure management tools:

    • Puppet -- seems pretty much a reimplementation of cgengine

    • Bcfg2 -- smells like academentia to me

    • ISconf -- from the Bootstrapping an Infrastructure people, seems to be based on the idea of a p2p distributed cache that stores pretty much a version control history of commands ran.

As usual, I haven't yet seen anything that would actually seem to work in the real world, unless you give up everything you already have (like package management etc), and do things 100% their way.

His suggestion: as the tools are based on very different worldviews, look at everything and try to pick the one that matches your opinions.

  • One thing he wouldn't skimp on: "Don't skimp on RAM."

  • At Revision3, they use long-life server hardware and don't upgrade the servers, instead they go for a full new deployment.

  • admin
  • cluster
  • configuration-management
  • scale5x
  • conference
  • talk
  • livejournal
  • digg