SCALE5x: Talk summary of Admin++, what root never told you

So I'm at SCALE5x , listening to Ron Gorodetzky talk about what he learned about sysadmining for Digg and Revision3 (who try to be an "Internet television network"; in effect, they distribute loads of big files). Most of the tools he mentioned I already knew, but it was nice to get independent reviews of "hey I think this is good". Here's what I took home from his talk:

  • He really thinks highly of the OSCon 2005 talk Livejournal's Backend (A history of scaling) (PDF ).

  • He liked memcached and MogileFS .

  • Between the lines I understood Revision3 has outsourced their big bandwidth use -- the CDNs he mentioned by name were Cachefly (the color scheme hurts even my eyes and real designers think I'm colorblind), BitGravity (caution hideous flash site) and of course Akamai .

  • He spoke about outsourcing data center operations, using things like Amazon EC2 and S3 . I need to come up with a budget and time to play with EC2.

  • He stressed the importance of setting up KVMs etc properly for the data center.

  • Set up your infrastructure and plan for scaling before you get popular, because you will be too busy to do them afterwards. That's nice, I like building things scalable from scratch.

  • Specific infrastructure management tools:

    • Puppet -- seems pretty much a reimplementation of cgengine

    • Bcfg2 -- smells like academentia to me

    • ISconf -- from the Bootstrapping an Infrastructure people, seems to be based on the idea of a p2p distributed cache that stores pretty much a version control history of commands ran.

    As usual, I haven't yet seen anything that would actually seem to work in the real world, unless you give up everything you already have (like package management etc), and do things 100% their way.

    His suggestion: as the tools are based on very different worldviews, look at everything and try to pick the one that matches your opinions.

  • One thing he wouldn't skimp on: "Don't skimp on RAM."

  • At Revision3, they use long-life server hardware and don't upgrade the servers, instead they go for a full new deployment.

2020-01-21T20:49:33-07:00, originally published 2007-02-10T15:19:00-08:00