Tv's cobweb


Better CoreOS in a VM experience

(This was all tested with CoreOS beta 991.2.0, the stable 835.13.0 fails to mount 9p volumes.)

CoreOS comes with instructions how to run it in QEMU. After the setup, it comes down to something like

exec ./ \
    -user-data cloud-config.yaml \
    -nographic \

with cloud-config.yaml looking like


hostname: mytest
  - name: jdoe
      - sudo
      - rkt
      - "ssh-ed25519 blahblah"

I used that for many experiments, but felt it was less than ideal.

I wanted to move beyond -net user. That just means calling QEMU directly, instead of using, no big deal. But it meant I would be writing my own QEMU runner, no matter what.

I also wanted more efficient usage of my resources -- after all, the whole point of me running several virtual machines on one physical machine is to make these test setups more economical.

The provided QEMU images waste disk and bandwidth. Every VM stores two copies of the CoreOS /usr image, just like a physical machine would. Copy-on-write trickery on the initial image will not help beyond the first auto-update, as each VM independently downloads and applies the updates. This means if you run a small test cluster of say 5 VMs, you'll end up with 10 copies of CoreOS, and 5x the bandwidth usage needed.

Imitating physical computers with virtual machines is great if you're trying to learn how the CoreOS update mechanism works, but once you're to the point of wanting to just run services, it's simply not needed.

CoreOS does have a supported mode where it does not use the USR-A and USR-B partitions: PXE booting, starting a computer by requesting the software over the network. I could even skip the virtual networking and use this with QEMU by launching the kernel and initrd directly, no need for PXE itself. However, this is wasteful in another way: it holds the complete /usr partition contents in RAM, using about 180MB. Once per each VM. There is also an annoying delay of 15+ seconds in VM startup, presumably related to the large initrd image, and later the kernel spends 1.2 seconds uncompressing it into a tmpfs (measured on a i5-5300U laptop).

Digging into the PXE image, I find that it actually stores the /usr contents as a squashfs -- which is a real filesystem that can be stored on block devices, as opposed to just unpacking a cpio to a tmpfs. The PXE image does what's called a "loopback mount", where a file is treated like a block device. In the PXE scenario, the file is held in RAM in a tmpfs; I can just put those bytes on a block device, and boot that!

(The Live CD seems to also hold /usr contents in tmpfs just like the PXE variant, even though it could fetch them on demand from the ISO. The squashfs image is random-access, unlike the usual cpio.gz that's used for initramfs contents. In later versions, CoreOS could switch their ISO images to use the trick I'll explain below -- at the cost of physical machines needing to spin up a CD more often than once per boot. The live CD has another downside that made me avoid it: to pass kernel parameters, I'd have to resort to kludges like creating a boot floppy image with syslinux and the right parameters on it.)

So, I set about fixing the wasted disk and bandwidth problem. Here's a story of an afternoon project.

Using the /usr image directly

Instead of holding an extra copy of the /usr image data in RAM, we can make it available as a block device, and load blocks on demand.

For that, we need the /usr squashfs image as a standalone file, not inside the cpio. It's not available as a separate download, but we can extract it from the PXE image:

gpg --verify coreos_production_pxe.vmlinuz.sig
gpg --verify coreos_production_pxe_image.cpio.gz.sig

zcat coreos_production_pxe_image.cpio.gz \
    | cpio -i --quiet --sparse --to-stdout usr.squashfs \

Prepare a root filesystem

We also need to make prepare a disk image that will be used for storing the root filesystem. CoreOS won't boot right with a fully blank disk. If it had, I would have used qcow2 as the format, but now I need to provide some sort of structure for the root filesystem, so let's go with a raw disk image.

I might have been able to set up the right GPT partition UUIDs for the initrd to mkfs things for me, but that seemed too complicated, and I doubted it'd support my "just the root" scenario as well as their nine-partition layout.

To keep it simple, we won't bother to use partitions; the whole block device is just one filesystem.

chattr +C rootfs.img
truncate -s 4G rootfs.img
mkfs.ext4 rootfs.img

Prepare user_data

This was previously done inside with a temp dir, but we'll just pass a directory as virtfs following the "config drive" convention. Let's move our previous file into the right place:

mkdir -p config/openstack/latest
mv cloud-config.yaml config/openstack/latest/user_data

Finally, run QEMU

qemu-system-x86_64 \
    -name mycoreosvm \
    -nographic \
    -machine accel=kvm -cpu host -smp 4 \
    -m 1024 \
    -net nic,vlan=0,model=virtio \
    -net user,vlan=0,hostfwd=tcp::2222-:22,hostname=mycoreosvm \
    -fsdev local,id=config,security_model=none,readonly,path=config \
    -device virtio-9p-pci,fsdev=config,mount_tag=config-2 \
    -drive if=virtio,file=usr.squashfs,format=raw,serial=usr.readonly \
    -drive if=virtio,file=rootfs.img,format=raw,discard=on,serial=rootfs \
    -kernel coreos_production_pxe.vmlinuz \
    -append 'mount.usr=/dev/disk/by-id/virtio-usr.readonly mount.usrflags=ro root=/dev/disk/by-id/virtio-rootfs rootflags=rw console=tty0 console=ttyS0 coreos.autologin'

You'll be greeted with the Linux bootstrap messages and finally

This is mycoreosvm (Linux x86_64 4.4.6-coreos) 06:14:10
SSH host key: SHA256:t+WkofIWxkARu1hezwPnS/vgTJXUcPidA3UxKr+1uGA (DSA)
SSH host key: SHA256:cT32H33EVCHSnrCRsB+I9GG7AgXQWfyjk7JFuEzAqFU (ECDSA)
SSH host key: SHA256:NFgc7BLbeyS3SslpscSSNHNzc7lXzx6vKqBmUp+5T7Q (ED25519)
SSH host key: SHA256:pK8Dknoib61FnIwMQ6u4F4FxeSMIRq9zYsrJd0N3MPY (RSA)
eth0: fe80::5054:ff:fe12:3456

mycoreosvm login: core (automatic login)

CoreOS stable (991.2.0)
Last login: Fri Apr  1 06:02:25 +0000 2016 on /dev/tty1.
Update Strategy: No Reboots
core@mycoreosvm ~ $


As usual with QEMU, press C-a x to exit.

Stay tuned for part 2, where we will make the VM even leaner.

  • coreos
  • virtualization
  • qemu
  • kvm
  • linux
  • howto

Unmarshaling a JSON array into a Go struct

Sometimes, you see heterogeneous JSON array like

["Hello world", 10, false]

Dealing with such an array in Go can be very frustrating. A []interface{} hell is just about as painful as the map[string]interface{} hell (See my earlier article about that).

The natural way to deal with data like that in Go would be a struct like

type Notification struct {
	Message  string
	Priority uint8
	Critical bool

See how much more meaning we've added?

Now, you can't just json.Unmarshal an array into a struct. I'll show you how to make that work.


  • go
  • json
  • programming
  • article

Go JSON unmarshaling based on an enumerated field value

In a previous article, we talked about marshaling/unmarshaling JSON with a structure like

	"type": "this part tells you how to interpret the message",
	"msg": ...the actual message is here, in some kind of json...

Last time, we left a repetitive switch statement in the code, where each message type was unmarshaled very explicitly. This time, we'll talk about ways to clean that up.


  • go
  • json
  • programming
  • article

Dynamic JSON in Go

Go is a statically typed language. While it can represent dynamic types, making a nested map[string]interface{} duck quack leads to very ugly code. We can do better, by embracing the static nature of the language.

The need for dynamic, or more appropriately parametric, content in JSON often arises in situations where there's multiple kinds of messages being exchanged over the same communication channel. First, let's talk about message envelopes, where the JSON looks like this:

	"type": "this part tells you how to interpret the message",
	"msg": ...the actual message is here, as some kind of json...


  • go
  • json
  • programming
  • article

Notes from SCALE12x

(Sidenote: this little blog engine had bitrotted pretty bad.. I reimplemented it with markdown, go & bootstrap, and it's much more pleasant to work with now. Time for new content!)

I spent Saturday & Sunday at the Southern California Linux Expo (SCALE), and here's my very personal report of how I experienced it.

SCALE is not your typical tech conference, it brings in very diverse groups of people. The organizers are actively trying to reach out to e.g. kids that are in that "might grow up interested in things" age. Just about every age group, techie background, and personal interest is present -- the common theme really is only Linux (and a few BSD-based vendors trying to sell their gear). Of course this means that SCALE won't ever serve my desires perfectly -- but it serves the community well, and the feel of the conference is very friendly and engaging.


First of all, I was too busy to go on Friday, and the streaming video had some sort of audio codec trouble, so I won't comment about content of the devops day. What I will say is that I'm impressed by the strength of the devops presence at SCALE. It's becoming a significant backbone of SCALE, year by year. Kudos to the organizers. And they're at it all year long -- the local ops-oriented meetups have a great community going. Heartily recommended, whether you carry a pager or not. Also see hangops.

SCALE also hosted another sub-event on Friday called Infrastructure.Next, @infranext. It looked interesting, though I fear overpresence of Red Hat and vendor agenda. I'm still waiting for slides and/or video of How to Assemble A Cutting Edge Cloud Stack With Minimal Bleeding. (The archived live streams for all three days are useless because of audio problems.)

I also missed Greg Farnum's talk on Ceph. I worked at Inktank for almost two years, and this technology is one of a kind, and a good indicator of what direction the future lies. If you deal with >20 machines, you should definitely take time to look into Ceph.


Saturday started off with a talk about SmartOS vs Linux performance tooling (slides). There wasn't much new there this time around, but Brendan is a good speaker, and SmartOS is probably the most serious server-side alternative to Linux I'd personally consider these days, so it's good to keep tabs on what they've been working on.

My interestests drew me next to the talk about Presto (slides). Takeaways:

  • batch and interactive systems have fundamentally different needs, e.g. for monitoring grace periods, how and when maintenance can be performed; they require a different ops culture.
  • Dain shared background on Facebook's internal networking challenges, and how data center power limits forced them to essentially trade off other servers for Presto servers, to avoid network bottlenecks.
  • Presto is integrating the BlinkDB research on approximate queries, e.g. <10% error for 10-100x faster queries sounds like a very good trade-off.
  • many "big data" stores don't store enough statistics about index hit rates to guide query planning

I'm sad I missed Beyond the Hypervisor (slides) due to a schedule conflict.

The OpenLDAP talk (slides) was really largely about LMDB, and that's what I came for. LMDB is a library that implements a key-value store, with an on-disk B-tree where read operations happen purely through a read-only mmap. This is a really nice architecture, pretty much as good as a btree gets -- that is, it's probably happiest with read-mostly workloads, and probably at its worst with small writes to random keys. Pretty much the opposite of LevelDB, there. I wish the benchmarks were less biased, but that seems to be the unavoidable nature of benchmarking. LMDB has a lot of the kind of mechanical sympathy that may remind you of Varnish: all aspects of caching are offloaded to the kernel, and data can be accessed in zero-copy fashion because the read-only mmap prevents accidents. For Go programmers, Bolt is a reimplementation of the design in pure Go, avoiding the Cgo function call overhead, and offering a much nicer api than the direct wrapper szferi/gomdb. My quick microbenchmarks say that, when used from Go, Bolt can be faster.

Next up was High volume metric collection, visualization and analysis. If I could take back those 20 minutes, I would.

I spent the rest of the day catching up with old friends and making new ones.


Clint Byrum is now at HP and working on TripleO, a project that aims to make OpenStack do bare-metal deploys, and then run a public-facing OpenStack on top of that. His talk was a good status report (slides), but in situations like this I always end up wanting more details.

For the next slot, I bounced between three different talks, not 100% happy with any one of them. First, Hadoop 2 (slides) was an intro to YARN et al that started off like an apologist "I swear Hadoop and Java don't really suck as much as they seem to". Mark me down as unconvinced.

Second, Configuration Management 101 was a good effort from a Chef developer to be party neutral, and talked about the common things you find in all the common CM frameworks. His references to promise theory are pretty much dead on, and in the 3 years since I fiddled momentarily with, my thoughts have gone more and more into thinking about distributed CM as an eventual consistency problem. With Juju-inspired notifications about config changes, using more gossip & vector clock style communication to update peers on e.g. services provided, this might result in something very nice. That one is definitely on the ever-growing itches to scratch list.

Third, Seven problems of Linux Containers was an OpenVZ-biased look into remaining problems. Some of it was a bit ridiculous -- who says containers must share a filesystem, just mount one for each container if you want to -- and some of it was just too OpenVZ-specific to be interesting. Still, a good topic, and OpenVZ was groundbreaking work.

For the next slot, I returned from lunch too late to fit in the packed rooms, and enjoyed breathing too much to try harder. I watched three talks, mostly from open doorways. The hotel's AC was not really keeping up anymore at this point, and only the main room was pleasant to be in.

Big Data Visualization left me wishing that 1) it wasn't fashionable to say "big data" 2) he'd have shown more visualizations 3) he'd talked about the hard parts.

ZFS 101 (slides) is interesting to me mostly to see what people think about & want from storage. Btrfs is really promising in this space, feature-wise; it still has implementation trouble like IO stalls, but the integrated snapshots and RAID are just so much more useful and usable than any combination of hardware RAID, software RAID, and LVM. Snapshots really need to be a first-class concern. So far, my troubles with Btrfs are of a magnitude completely comparable to my troubles with the combination of LVM, LVM snapshots, HW-RAID cards dying, and SW-RAID1 sometimes booting the drive that was meant to be disabled. All in all, I find the "not yet stable" argument a bit boring; there's a whole lot of code and complexity in Btrfs, but it also removes the need for a whole lot of other kinds of code and complexity. If nothing else, the ZFS/Btrfs feature set should be a design template for future efforts; I understand e.g. F2FS has a very specific design goal (think devices rather than full computers), but not supporting snapshots in a new filesystem design is a bummer.

And finally, I spent time in Jordan Sissel's fpm talk. fpm is a tool that converts various package formats into other package formats, a lot like Alien. Jordan's viewpoint on this is a frustrated admin who just wants the damn square peg to fit in the round hole, and fpm is the jigsaw & hammer that'll make that happen. I fundamentally disagree with him about the role of packaging; the whole point of packaging is destroyed if the ecosystem has too many bad packages, and the reason e.g. Debian packaging can be a lot of work is not because cramming files in an archive should be hard, but because making all that software work together and upgrade smoothly actually is a difficult problem. But Jordan is an entertaining speaker, and his point is valid; there are plenty of cases where you don't care about the quality of the resulting package. Just.. please don't distribute them, ok?


Slides for my recent talks

I just put up a bunch of slides from talks I've presented lately:


Keyboards influenced by touchscreens

By now, you have probably used a touchscreen keyboard. We've come a long way from the clumsy "kiosk" computers that brought touchscreen keyboards to mainstream, a decade or two ago. But a classic keyboard with physical keys is still preferable for the tactile feedback we get from pressing the keys. Until touchscreens can provide that, we'll be using traditional keyboards for a while.

But how do touchscreen keyboards differ from physical keyboards, and what ideas could we copy from them to improve the user experience of traditional keyboards? Well, for one, most touchscreen keyboards these days don't do key repeat -- instead, they'll pop up a menu of alternatives, often the same base letter with with various accents, diacritics and umlaut. And you slide your finger to pick one of these options.


Now, that sounds good. I for one don't know how to type the various variations of the letters on a US keyboard, and as a Finn I actually need "ä" and "ö" sometimes. I can type them in Emacs, but not in my IM application or web browser.

Here's the idea: we don't really use autorepeat on the A-Z characters. Instead, make a long press of a letter key bring up an on-screen menu with variations of the basic letter.

On a touchscreen, choosing from the menu is immediate and fairly intuitive. While using the mouse for that would be straight forward -- and probably what a first time user will try -- nobody wants to type like that. We're stuck with one finger holding down the original key. We need to make a selection from up to about 9 alternatives. Here's two easy ideas (assuming the initial highlighted alternative is the base letter):

  • make space move to the next highlighted alternative, or wrap around to the left
  • make the four primary home row keys of the other hand highlight alternatives 1-4; make pressing the same home row key again act like space, above

For example, say I want to type "ä". My options are (assuming US keyboard layout):

  1. hold "a", wait till menu pops up, while still holding "a" press space 4 times until "ä" is highlighted, release "a"

  2. hold "a", wait till menu pops up, while still holding "a" press ";", release "a"

I think I'd use that more than auto-repeat. What about you?

  • keyboard
  • touchscreen
  • human-computer-interface
  • idea
  • wishlist
  • future

Deploy tools

I've been looking at the world of deployment tools lately. Outside of Puppet and Chef (and ignoring the old beards Bcfg2 & Cfengine), what other things are there?

Fabric lets you write Python functions to describe "tasks" to be run. The Python functions are run on a client machine -- for example, the sysadmin's laptop -- and each task can be directed to operate on hosts or roles (groups of hosts), over SSH. The functions can run remote commands with run("echo hello, world") and sudo("chmod u=rw,go=r /etc/passwd"). Fabric is a very useful piece of the puzzle, but doing more complex operations one shell command at a time gets frustrating. I keep wanting something that can run whole chunks of Python easily, on the target machine. Fabric also does nothing to solve the problems of e.g. multiple admins running a deploy command at the same time.

Kokki is closer to Chef in it's style (and literally, "Chef" in Finnish). It's a framework for writing cookbooks, with actions like File("/etc/greeting", content="hello, world") in them. Then a configuration for a machine can invoke certain recipes. Kokki seems to be aimed fully at running things locally; that is, if you're deploying things, you'd run Kokki on the target machine. Kokki is still in fairly early development, it's website and source code don't match each other at all, and many of the cookbooks no longer work with the current version. It also still inherits a bit too many non-Pythonic elements from Chef, for my tastes. Still, to this Pythonista, it looks very promising, and I will be exploring it further.

Poni ("Pony" in Finnish -- I want one too!) is another Python project that takes a very different tack. It is built on a command-line tool that lets you define your infrastructure through hierarchical collections of key-value settings; that is, you describe the whole multi-machine service with database servers, load balancers, app servers and all. You can use inheritance much like other deploy tools use cookbooks. The command-line tool seems to be meant to be used for everything; the stored data is not really meant to be edited directly. While I appreciate the polish of the command-line tool, the editor-hostility comes off a bit odd. Especially so when the getting started guide has me "uploading" template files and Python source code to Poni's internal configuration storage. Am I really supposed to have two copies of these files?

Once you have your infrastructure defined, Poni provides you two main methods to actually make changes: you can create files based on templates (that have a strong mechanism for referring to any values from the configuration, including things like sharing the database connection information between the DB server and the client config), or you can run custom functions ("control commands"). The Python functions run locally, but Poni provides a remote execution framework very similar to the one in Fabric, though at least for now it is significantly more verbose. And, to my disappointment, doesn't really allow running full Python functions remotely either.

Somewhat confusing is the difference between the "create a file" and the "run control command" functionality. It is not quite clear how the whole is intended to orchestrate the full deployment, and the examples are both lacking and misleading. For example, right now the Puppet deployment example requires you to run a command to create some files, watch it fail, run another command to install software, then run the first command again to create the rest of the files. (Kind of weird to deploy Puppet with Poni in the first place..)

There is one thing about Poni that I am already starting to dislike. Currently, every identifier you need to refer to on the command line is given as a regexp, and commands act on all matches. This leads to high risk for operator errors: for example, the documentation itself uses $find("webshop/frontend") as an example; yet that would also match webshop/frontendforsomethingelse. I do hope the author changes his mind about regexps everywhere.

Much like Kokki, Poni is very early on in its development; it's command-line tools and things like variable referencing are top notch, but the picture is very much not complete yet. But this one is definitely a project to watch.

For posterity: I filed a bunch of issues about the things I bumped into:

  • deploy
  • software
  • fabric
  • kokki
  • poni

EuroPython 2008 videos are up

Update: Well, they used to be. pulled a nasty stunt and removed tons of user content. And [EuroPython][] itself has a nasty habit of not providing archived websites for previous years. This all has linkrotted. Don't you hate the internets?

My talks were:

  • europython2008
  • europython
  • python
  • talk
  • video
2014-02-05T20:01:15-08:00, originally published 2008-08-11T18:27+03:00

EuroPython 2008 wrap up

Lightning talks

EuroPython 2008 was fun. I presented two talks (My God, it's Full of Files -- Pythonic filesystem abstractions and Version Control for Du^H^HDevelopers) and one lightning talk (RST+S5 for your slides), participated in a bunch of open space sessions, listened to about 13 talks, took a bunch of pictures, but most importantly had interesting hallway conversations with interesting people.

"Buildout for Pure Python Project", Carsten Rebbien

As usual, PyPy was heavily represented, and seems to be making nice progress toward being the nice and featureful default Python implementation of the future. I especially liked the restricted execution features and the LLVM backend. The zc.buildout talk made me decide I will try to replace (part of?) one custom deploy mechanism with zc.buildout -- most likely I'll end up rewriting most of the current things as zc.buildout recipes, but hopefully some of the pre-existing recipes will be useful, and hopefully I can then later reuse the recipies I create for this setup.

Personally, I think my talks went ok. I understand videos will be available later, as soon as transcoding etc are finished. I'm anxious to see them myself, as I'm still finetuning my public speaking skills. I'm learning, though -- this year I had no trouble staying within my time slot, even when I was adjusting verbosity on the fly.

For some reason, I felt underprepared for the filesystem API talk, but ultimately people liked the idea of a consistent Pythonic filesystem API enough that we had an open space session on it, and people were enthusiastic about a sprint to prototype the API. Which is what we ended up doing, too -- I'll blog separately about the results of that.

My decentralized version control talk seemed to me to go over more smoothly; I guess that's just because I've been thinking about version control and project management a lot lately, so it was easy to talk about the topic in a relaxed way. On the other hand, it wasn't as much a call to action, and it really was overly generic, so I didn't get as strong audience participation there. We did have an interesting conversation about branch management strategies and such, though. I consciously tried to keep the talk on a generic level, as I felt a pure git talk would have alienated some listeners, but I did end up feeling restricted by that. There was some interest on a Teach me git -style session, but what we ended up doing was just talking one on one about getting started with git, during the sprints. Sorry if I missed any one of you -- grab me on #git to continue, or find me in future conferences ;)

Twisted Q&A session

I was requested to organize an open space session for Twisted Q&A, and that is exactly what we did. We went through a bunch of things related to asynchronous programming concepts, Deferreds, working with blocking code and libraries, database interfaces, debugging and unit testing.

I was also pulled in to another Twisted open space session, that was mostly about what greenlets are and how to use them. I tried to explain the differences between classical Deferreds, deferredGenerator/inlineCallbacks, and greenlets, to the best of my understanding. As a summary, with greenlets any function you call can co-operatively yield execution (I mean yield in the scheduling meaning, giving away your turn to run, not in the Python generator meaning -- interestingly inlineCallbacks etc actually make those be the same thing... my kernel instincts make me want to say "sleep"). Yielding in any subroutine means anything you do may end up mutating your objects -- which is the root evil behind threading we wanted to get away from. All the other mechanisms keep the top-level function in explicit control of yielding. Around that time, most people left for lunch, but about three of us stayed and talked about debugging Deferreds and network packet processing with twisted.pair and friends.

Lobby area

One of the interesting hallway conversations was about what happens when upstream web hosting listed on PyPI is failing. It seems PyPI already does some sort of mirroring, but even that might not be enough. Many companies seem to be bundling eggs of their dependencies in their installation package, which sounds like a good setup for commercial click-to-install deployment. But it would still be good to see a CPAN-style mirror network for PyPI, and at least some people seemed even motivated to donating servers and bandwidth. Personally, I'm mostly spoiled by the combination of Debian/Ubuntu and decentralized version control, and my level of paranoia is too high to automatically install unverified software from the internet anyway. My primary motivation in the conversation was to point out that PyPI already has some sort of mirroring/upload setup, and that you'd really want to specify exact versions and SHA-1 hashes of your dependencies. Optionally, you could delegate the known good hash storage to PyPI (assuming you trusted PyPI not to attack you), but that would require a full Debian-style signature chain from a trusted key, or you'd be owned by anyone capable of MITM attacks, DNS forgery, or cracking a PyPI mirror.

  • europython2008
  • europython
  • python
2008-07-20T14:02+03:00, originally published 2008-07-19T23:17+03:00

EuroPython 2008 talk #1: My God it's Full of Files

I published slides for my first Europython talk. Note the slides might not easy to understand without the actual talk -- the video streaming at Ustream will work if the network infrastructure can take it, and a downloadable video should be available later.

My God, it's Full of Files

Pythonic filesystem abstractions: An overview of different filesystem(-like) APIs in Python and attempts for unifying them.

There's a lot of different filesystem(-like) APIs in Python. I intend to provide an overview of existing projects, their status and capabilities, and hopefully inspire you to work on improving things.


  • europython2008
  • europython
  • talk
  • python
  • filesystem
  • programming

Incremental mapreduce

So Google has their MapReduce, and the people behind CouchDB are throwing around their ideas. I spent some time thinking about incremental mapreduce around July, and it's time I type out that page full of scribbles.

First of all: I think the ideas thrown out by Damien above aren't really mapreduce, As Google Intended. The real power of mapreduce is in its inherent combination of parallelism and chainability, output of one mapreduce is input to another, each processing step can run massively in parallel with each other, etc. The proposed design is like a one-iteration retarded cousin of mapreduce.

With that bashing now done (sorry), here's what I was thinking:

The way I imagined building an incremental mapreduce mechanism, without storing the intermediate data and just recomputing chunks that are out-of-date (which would be lame), is to add one extra concept into the system: call it "demap". It will basically create "negative entries" for the old data. This is basically what Damien did by providing both the old and new data map calls, all the time, just said differently, and I think my way might make the average call a lot simpler. And I don't see any reason why my version wouldn't be parallelizable, chainable, and generally yummy.

Read the article

  • mapreduce
  • couchdb
  • parallel
  • cluster
  • programming
  • database
  • search

Snakepit and gitosis, things I've been working on

A brief update of things I've been working on:


Snakepit is a port of (part of) HiveDB to Python and SQLAlchemy. It will help you write database-backed applications that need to scale further than one database server, or even a master-slave setup, can take you. It's MIT licensed, that is, pretty much as free-for-all as it can be. And it's still work in progress, so don't be too harsh yet ;)

See for more.


gitosis aims to make hosting git repos easier and safer. It manages multiple repositories under one user account, using SSH keys to identify users. End users do not need shell accounts on the server, they will talk to one shared account that will not let them run arbitrary commands. gitosis is licensed under the GPL.

First real release will come as soon as I have to time to go through a couple of really minor nits. It's been self-hosting for a long time now.

See for more.

  • software
  • programming
  • git
  • database
  • scalability

KVM, the virtualization mechanism, rocks

For years now, my primary machine has been a laptop. I've been avoiding running Xen 1 because the hypervisor it injects between hardware and Linux hasn't been very friendly for power saving. Modern laptops are too often hot enough without any extra help.

Well, I'm happy to say that as of now, KVM is definitely stable enough to replace my use of Xen for test setups. I expect that shortly it will become what I recommend for server use, too. And because it lets Linux be Linux, instead of doing any oddities with the hardware, all the powersaving etc goodness still works perfectly.

So far, I've stumbled on two things:

  1. The version of KVM in Ubuntu, even gutsy, is just too old. With a bit of fiddling with the patches, KVM v46 built a perfectly working deb.

  2. The isolinux graphical bootup, used in Ubuntu, crashes KVM (and with v28, it crashes the host machine -- beware!). See the bug report. I got around that by using mini.iso, but you could always fall back to debootstrap; that was all we ever really had with Xen.

And while I have the soapbox: I have a dream. It's KVM running with SDL/VNC graphics in a window that's resizeable all the way to the virtual machine, with XRandR. Please make it happen!


So, the edited version of the patches:

And just remove from-debian-qemu/62_linux_boot_nasm.patch, it seems to have made it upstream.

  • kvm
  • xen
  • virtualization
  • xrandr
  • idea

  1. What's up with, by the way? That website is just a pile of horrible non-informative enterprise speak. Utterly useless, and that tends to alienate the techie community pretty fast. At least it's alienating me.

Rotaclock -- a unique clock where the whole wrist display the time

(This post didn't originally get published due to technical problems with the graphics. It was written in 2007 and was rescued & inserted into the archive in 2015.)

Imagine a wristwatch with nothing but the wristband. Time flows around your wrist; parts of the wristband light up to indicate what time it is, and you can watch seconds race loops around your wrist.

The concept may need some tuning, as some of the readings are awkwardly behind your wrist. Perhaps there's a gap there that the indicators jump over, around the closing mechanism of the wrist band.


  • idea
  • gadget
2015-09-11T17:09:52-07:00, originally published 2007-06-06T15:09-08:00

Howto host git on your Linux box


This solution is obsolete. Use gitosis instead!

Gitosis is basically git-shell-enforce-directory's big brother, and an actual software project. Use it.

Updated to drop --use-separate-remote from git clone, it's the default.

Updated to add --read-only to git-shell-enforce-directory.

I've run repeatedly into cases where I want to provide services to people without really trusting them. I do not want to give them shell access. I don't want to even create separate unix user accounts for them at all. But I do want to make sure the service they use is safe against e.g. password sniffing.

Instead of trying to run the version control system over HTTPS (like Subversion's mod_dav_svn that will only work with Apache, which I don't run), I want to run things through SSH. SSH is the de facto unix tool for securing communications between machines.

Now, I said I don't want to create a unix user account for every developer using the version control system. With SSH, this means using a shared account, usually named by the service it provides: svn, git, etc. To identify different users of that account, do not give the account a password, but use SSH keys instead. To avoid giving people full shell access, use a command="..." when adding their public key to ~/.ssh/authorized_keys.

For Subversion, I submitted an enhancement to add --tunnel-user, to make sure the commit gets identified as the right user, and then used command="..." with the with arguments, like this (all on one line):

command="/srv/ -t
  --root /srv/
  --tunnel-user jdoe" ssh-rsa ...

Where the view directory is a bunch of symlinks to the actual repositories, allowing me to do group-based access control.

With git, the author of the changeset is recorded way before the SSH connection is opened. Without building some sort of access control in git hooks on the server, every developer can pretty much ruin the repository by overwriting branches with bogus commits. What they will not have is access outside of the repository, or a way to actually remove the old commits from the disk (unless you run git prune on the server). The distributed nature of git makes this reasonably easy to detect, and pretty much trivial to recover from. For any real trust in the code, you should look at signed tags anyway. The included wrapper allows you to have read-only users, but provides no detailed access control against developers with write access; they just won't be able to escape to the rest of the filesystem.

So, with that introduction out of the way, let's get to configuring:

  1. Install git on the server:

    sudo apt-get install git-core git-doc
  2. Create the directory structure store the repositories and related files

    sudo install -d -m0755 \
         /srv/ \
     /srv/ \
     /srv/ \
  3. Create the shared user account for this service:

    sudo adduser \
         --system \
     --home /srv/ \
     --no-create-home \
     --shell /bin/sh \
     --gecos 'git version control' \
     --group \
     --disabled-password \
  4. Set up a script that makes sure only relevant git commands can be run via SSH, and to limit the visible section of the filesystem to things you actually want to give access to; put this file in /usr/local/bin/git-shell-enforce-directory (download) and chmod a+x it

    # Copyright (c) 2007 Tommi Virtanen <>
    # Permission is hereby granted, free of charge, to any person
    # obtaining a copy of this software and associated documentation files
    # (the "Software"), to deal in the Software without restriction,
    # including without limitation the rights to use, copy, modify, merge,
    # publish, distribute, sublicense, and/or sell copies of the Software,
    # and to permit persons to whom the Software is furnished to do so,
    # subject to the following conditions:
    # The above copyright notice and this permission notice shall be
    # included in all copies or substantial portions of the Software.
    # Enforce git-shell to only serve repositories
    # in the given directory. The client should refer
    # to them without any directory prefix.
    # Repository names are forced to match ALLOW.
    import sys, os, optparse, re
    def die(msg):
        print >>sys.stderr, '%s: %s' % (sys.argv[0], msg)
    def getParser():
        parser = optparse.OptionParser(
            usage='%prog [OPTIONS] DIR',
            description='Allow restricted git operations under DIR',
                          help='disable write operations',
        return parser
    ALLOW_RE = re.compile("^(?P<command>git-(?:receive|upload)-pack) '[a-zA-Z][a-zA-Z0-9@._-]*(/[a-zA-Z][a-zA-Z0-9@._-]*)*'$")
    def main(args):
        parser = getParser()
        (options, args) = parser.parse_args()
            (path,) = args
        except ValueError:
            parser.error('Missing argument DIR.')
        cmd = os.environ.get('SSH_ORIGINAL_COMMAND', None)
        if cmd is None:
            die("Need SSH_ORIGINAL_COMMAND in environment.")
        if '\n' in cmd:
            die("Command may not contain newlines.")
        match = ALLOW_RE.match(cmd)
        if match is None:
            die("Command to run looks dangerous")
        allowed = list(COMMANDS_READONLY)
        if not options.read_only:
        if'command') not in allowed:
            die("Command not allowed")
        os.execve('/usr/bin/git-shell', ['git-shell', '-c', cmd], {})
        die("Cannot execute git-shell.")
    if __name__ == '__main__':
  5. Create your first repository:

    cd /srv/
    sudo install -d -o git -g git -m0700 myproject.git
    sudo -H -u git env GIT_DIR=myproject.git git init

    (with git older than v1.5, use init-db instead of init)

  6. Set up an access control group and give it access to that repository:

    cd /srv/
    sudo install -d -m0755 mygroup
    cd mygroup
    sudo ln -s ../../repos/myproject.git myproject.git

    You can also use subdirectories of view/mygroup to organize the repositories hierarchically.

    Note, one SSH public key will belong to exactly one group, but if necessary you can create a separate group for each account for absolute control.

    Note, access to repository implies write access to repository, at least for now. You could make

  7. Get an SSH public key from a developer and authorize them to access the group:

    cd /srv/
    sudo vi .ssh/authorized_keys

    How the developer generates their key is out of scope here.

    Add a line like this, with the public key in it (all on one line, broken up in the middle of word to make sure there is no misunderstanding about when to use a space and when not to):

    command="/usr/local/bin/git-shell-enforce-directory /srv/exampl",no-port-forwarding,no-X11-forwar
        ding,no-agent-forwarding,no-pty ssh-rsa ...

    Or to allow only read-only access, add --read-only as an option.

  8. You can now push things to the repository with:x

    git push mybranch:refs/heads/master

    Note that before the first push, your server-side repository will not contain even an initial commit, and can't really be cloned.

  9. Now the developer can clone the repository:

    git clone git@myserver:myproject.git

    or to avoid some behavior of older git that I consider confusing (needs git v1.5 or newer):

    git clone -o myserver git@myserver:myproject.git

    They will probably want to set up ssh-agent to avoid typing the passphrase all the time.

And you're done! Good luck with your adventures with git, and welcome to the 21st century and to distributed version control systems.

  • git
  • admin
  • howto
  • ssh
2008-03-19T22:02+02:00, originally published 2007-03-23T17:43-07:00

Howto buy a used car in California

Tomorrow, if everything goes well, I will buy a used car from an individual in California. Here's a checklist of things for that, to help others in similar situations. Some things may have already been omitted because they weren't relevant for me, so you may want to independently browse the websites I'm using as sources. I'm skipping everything related to finances and haggling. I'm not covering cases where the car isn't in good condition. Also, buying from dealerships is different. Good luck.

(Updated to mention REG 262.)

  1. Find an ad on craigslist or whatever. I liked

  2. Call the seller. Don't just email, start evaluating the seller. Things to ask: (though that list is horribly extensive; I'd rather just pick the three best-looking candidates and do that thing on the spot; you'll need to doublecheck anyway to make sure the seller wasn't lying).

    Ask for:

    • the VIN of the car (usually 17 characters, on windshield)
    • full name of the seller (you'll need it anyway for the check)
  3. Check the CARFAX report for the car. Go for the $24.95 30-day option, you should always look at more than one car.

    • compare odometer, accident history etc with what seller said (better yet, just avoid accident cars, the risk isn't worth it)
    • make sure it's not an old junk car poorly repaired ("salvage" title)
  4. Check the smog check history (fails are a sign of trouble). In California, in most of the cases, the seller is required to provide a certificate of check newer than 90 days old -- it seems the information flow to website is slow enough that this latest check does not show up, or something.

  5. Check that the car hasn't suffered storm damage. (Annoying website demands cookies for a simple form. Suck.)

  6. Make an appointment and go see the car. In good sunlight, you want to see the car.

    • for some things to check, read more, especially On-the-Lot Checklist, Road and Test Checklist
    • you have to drive the car; leave a photocopy of your drivers license if needed for assurance
    • bring a car nut friend who knows what to check; play good cop bad cop (this checklist is not as good, as I am not a car nut)
    • on the lot:

      • paint chips
      • cracked windows
      • accident damage
      • signs of water damage
      • tires
      • check the VIN you were given against windshield, doors, engine, dashboard, major body parts; if they don't match you are dealing with a criminal
    • while driving:

      • brake hard, is it even
      • rev the engine, does it sound healthy
      • make plenty of starts and stops
      • go through a manual gearbox, there should be no grinding noises
      • see if the car will drive straight with your hands off the wheel
      • how does the clutch feel?
      • listen for noises throughout the test drive
    • after driving:

      • check for leaks
    • to be really careful, you should check a lot of things like

      • AC
      • windows up/down
      • radio/cd/speakers
      • etc
  7. Take the car to a mechanic of your choosing for evaluation. Alternatively, choose to trust on service from brand name vendor.

  8. Check the price against Kelley Blue Book etc.

  9. Now you're in business. Only bureaucracy and doublechecking left.

  10. If you know what you're willing to pay, go to your bank and get a cashiers check. Or two alternatives, if that works out. Otherwise, you will need to go back to the bank after haggling, and without a deposit the seller may have sold the car to someone else. Or something. Tuff.

    (Some people say haggling at dealerships is easier if you show up with a cashiers check just a bit short from what they're asking.)

    Don't pay with cash, there's even less chance of getting it back than cancelling a check.

    If you happen to be a seller reading this, do the actual sale in a bank to be sure you aren't being cheated.

  11. Download and print PDF forms from DMV.

    • Bill of Sale (PDF)

    • Statement of Facts may interest you (PDF)

    • things the seller needs: Notice of Release of Liability (PDF)

  12. In California, sellers are required to provide a smog certificate. Make sure you get one. Stuff on smog checks: 1, 2, 3. Frankly, I'm still a bit confused myself when I will need to do a smog check.

  13. Check that the registration is current and that the car wasn't repurchased under the California "Lemon law". (Err, how? As I understand it should say so in the Certificate of Title paper)

  14. The seller should find the "pink slip" aka Certificate of Title

    • if said paper does not have form fields for ownership transfer and odometer reading, you need form "REG 262" from the DMV, which is printed on special paper and not available as PDF. SUCK!

    • check seller name against his drivers license

    • fill it out with both of your info and both sign it; for instructions search for "Where do I sign?" on this page

    • if a bank or something still owns a chunk of the car, their signature is also needed on the pink slip; I'd be inclined to avoid the complexity

    • fill in the odometer value, both sign (read more)

    • seller keeps the Notice of Transfer and Release of Liability part and submits to DMV.

    • buyer fills in the back of the title to transfer ownership

  15. Things to ask for before leaving

    • is a special wheel lug key needed, get it
    • are there extra keys
    • how does the car alarm work
    • on a convertible, how does the roof work
    • any "tricks" you should know
  16. Now you're ready to leave with your new car! Check that you have

    • Certificate of Title, signed by both, also odometer section

    • Bill of Sale, signed by both

    • maintenance records

    • smog certification

    • owners manuals, repair manuals

    • spare tire, jack

    (sources 1, 2)

  17. Seller has 5 days to submit Notice of Transfer and Release of Liability to DMV. Do it online.

  18. Buyer has 10 days days to report ownership change to DMV and max 30 days to pay the fees. Read more: checklist, things to send to DMV, more info.

  19. Taxes?

General resources:

  • The other side of the story: seller howto. The person you are buying from should have done all of this, and this is what you can demand from him. Also, see the links and actual transaction guidance inside.

  • Used car buying tips at

  • car
  • california
  • los-angeles
  • howto
2007-02-20T23:59-08:00, originally published 2007-02-20T10:40-08:00

SCALE5x: Talk summary of the horribly named Red Hat Xen talk

More SCALE5x: Sam Folk-Williams is doing a talk called Xen Virtualization in Red Hat Enterprise Linux 5 and Fedora Core 6: An overview for System Administrators (UNGH!). And demonstrates why I hate "big" companies like Red Hat: they sent a non-technical, but well-practised, person to talk about Xen. He sounds convincing, but ended up explaining Xen domU migration without understanding the concept of shared storage. Gah.

Also note how talk summary promises live demos of Xen integration features only Red Hat has, and how the actual talk contained no such thing. If I didn't have wireless right now I'd be annoyed. Thank you SCALE5x organizers, the wifi is just great.

  • admin
  • redhat
  • xen
  • scale5x
  • conference
  • talk
  • bad
2007-02-10T15:49-08:00, originally published 2007-02-10T15:45-08:00

SCALE5x: Talk summary of the OpenWengo talk

More SCALE5x: Dave Neary is talking about OpenWengo_. Note to self: Wengo = TelCo, WengoPhone = software, OpenWengo = project developing WengoPhone -- or something. At least it's not just .org for community and .com for services, even if the names are way too close to eachother.

Good quote (not his, didn't catch the name):

"People don't want to buy a quarter-inch drill.
They want a quarter-inch hole!"

He recommended this blog for anyone interested in user interface design:

Choice quotes:

"Cross Platform (but sound on Linux is a disaster)"

"Surprisingly, for Microsoft, it's not SIP... pure SIP."
(talking about MSN Messenger)

They intend to implement XMPP-based transport mechanisms. Mentioned inkboard, an Inkscape extension(?) for whiteboard-style sharing of drawing over the internet.

They have games over OpenWengo (I guess XMPP?), like chess.

"Oh did I mention sound on Linux is horrible?"

Heh, we're calling audience members during the talk. From France. And it didn't work ;)

OpenWengo has cross-platform video conferencing. Wow.

  • voip
  • sip
  • im
  • software
  • scale5x
  • conference
  • talk
2007-02-19T19:42-08:00, originally published 2007-02-10T17:05-08:00

SCALE5x: Talk summary of Admin++, what root never told you

So I'm at SCALE5x, listening to Ron Gorodetzky talk about what he learned about sysadmining for Digg and Revision3 (who try to be an "Internet television network"; in effect, they distribute loads of big files). Most of the tools he mentioned I already knew, but it was nice to get independent reviews of "hey I think this is good". Here's what I took home from his talk:

  • He really thinks highly of the OSCon 2005 talk Livejournal's Backend (A history of scaling) (PDF).

  • He liked memcached and MogileFS.

  • Between the lines I understood Revision3 has outsourced their big bandwidth use -- the CDNs he mentioned by name were Cachefly (the color scheme hurts even my eyes and real designers think I'm colorblind), BitGravity (caution hideous flash site) and of course Akamai.

  • He spoke about outsourcing data center operations, using things like Amazon EC2 and S3. I need to come up with a budget and time to play with EC2.

  • He stressed the importance of setting up KVMs etc properly for the data center.

  • Set up your infrastructure and plan for scaling before you get popular, because you will be too busy to do them afterwards. That's nice, I like building things scalable from scratch.

  • Specific infrastructure management tools:

    • Puppet -- seems pretty much a reimplementation of cgengine

    • Bcfg2 -- smells like academentia to me

    • ISconf -- from the Bootstrapping an Infrastructure people, seems to be based on the idea of a p2p distributed cache that stores pretty much a version control history of commands ran.

As usual, I haven't yet seen anything that would actually seem to work in the real world, unless you give up everything you already have (like package management etc), and do things 100% their way.

His suggestion: as the tools are based on very different worldviews, look at everything and try to pick the one that matches your opinions.

  • One thing he wouldn't skimp on: "Don't skimp on RAM."

  • At Revision3, they use long-life server hardware and don't upgrade the servers, instead they go for a full new deployment.

  • admin
  • cluster
  • configuration-management
  • scale5x
  • conference
  • talk
  • livejournal
  • digg

IMAP over SSH Howto

Tired of managing n+1 passwords? Hate having an extra network port open on that server box? Want to have automated replication of email to your laptop in a Unix command line geek-friendly fashion?

Here's how to make OfflineIMAP synchronize mail between local and remote Maildirs.

  • on the client:

    • create an SSH key pair with no passphrase:
    $ ssh-keygen -t rsa -N '' -f ~/.ssh/imap-preauth-key
  • on the server:

    • install Binc IMAP on the server; no need to have it actually listen for network connections

    • I store my mail as ~/.Mail on the server; create a ~/.bincimap on the server and adjust to fit:

    Mailbox {
      depot = "IMAPdir",
      umask = "0077",
      path = ".Mail",
    • create a shell script ~/bin/imapd-preauth that'll start the IMAP daemon in a preauthenticated mode; note that OfflineIMAP wants a certain style of handshake bincimapd doesn't know how to do, so we fix that with sed:
    set -e
    bincimapd|sed --unbuffered '1s/^FAKE OK PREAUTH/* PREAUTH/'

    Make the script executable (duh).

    • authorize the previously generated SSH key to run only the above script -- add the following to ~/.ssh/authorized_keys (split here for readability, make it all one line; replace THINGS to fit):
    no-X11-forwarding,no-agent-forwarding,no-pty SSHPUBLICKEYHERE
  • on the client:

    • tell OfflineIMAP about the preauthenticated IMAP connection:
    [Account SOMETHING]
    localrepository = local-SOMETHING
    remoterepository = remote-SOMETHING
    [Repository local-SOMETHING]
    type = Maildir
    localfolders = ~/data/mail/SOMETHING
    [Repository remote-SOMETHING]
    type = IMAP
    remotehost = HOSTNAME
    preauthtunnel = env -u SSH_AUTH_SOCK ssh -q -i ~/.ssh/imap-preauth-key %(remotehost)s fake-command

That should be it! Have fun.

(And if you just broke it, feel free to give one of the halves to me.)

  • imap
  • email
  • offline
  • software
  • offlineimap
  • ssh
  • howto

Six Word Scifi is so much fun it has to be wrong, somehow. Here are my favorites so far (yes, one of them is mine):

Back me up before I die.


Even at light speed, I wait.


All alone in his light cone.



The Phone Killer Phone

I now know what I want from my next phone. And it'll totally blow the whole phone concept out of the water.

Start with a mostly-open hardware platform like Neo1973, add Linux (OpenMoko) on top. And no need to cram in a clumsy qwerty keypad, just carry a one-hand keyboard when you care about it. Less clumsy when you don't want more than a phone, full SSH sweetness when you want it. The phone itself is purely touch-screen, and the keyboard can actually get respectable WPM with real keypress feedback. And the big part is, because the phone is actually Open, plugging all this in is not a big problem! That's just great!


I need a bag

I have a shoulder strap-style bag, manufactured and used by the German army, bought as surplus and dyed black. My 12" thinkpad fits perfectly inside of it. But the strap is sewn in place, and seems to fail every few years -- and now's the time.

The bag is really good, and I am going to get it fixed, but that doesn't mean I can't look at alternatives. So, I need something that fits a 12" thinkpad, isn't too big, preferably comes in black, and is otherwise non-attention grabbing and doesn't look like it'd contain a laptop. According to Lenovo, my laptop is about 268x211x20mm.

My options:

  • small messenger bag: 1, 2
  • hard-shell attache case (not really my style, but a 12" one in black might rock it -- though I'm not going to buy a 15" case with internal padding to make a 12" laptop not bounce around, I want a smaller case too; it'd still be plenty big for the obligatory dead tree notes and docs; for pics, see 3, 4
  • the manliest purse ever from Maxpedition_: I'd be willing to carry that (in black of course;) if it fit a 12".. And I don't think it will.
  • more classical military style: M-51 Engineers Field Bag, Urban Explorer Black Canvas Shoulder, etc_..
  • something from but the style doesn't really smack me in the face with want
  • something from Timbuk2_, but they really don't seem my style

None of those really work for me. I guess considering the bags I already own, I might go for something I don't have. Mmm, a smallish hard-shell attache case in black, with a good enough shoulder strap that I can sling it over my head. That might just do it. Now where do I get one?

2006-11-26T22:14-08:00, originally published 2006-11-17T11:30-08:00

A Revver command line video upload tool

Update: It seems the script had gone missing at some point. It's back.

As you may or may not have noticed, I do a bunch of stuff for Revver. I ended up writing a sort of a tutorial to the Revver API, and as I like to collect all kinds of code samples here, I thought I should crossblog it here. The original is on the Revver developer blog.

One day, I was on a slow internet connection and wanted to upload a few files. I wanted something more batch-oriented than the web-based upload, and I have a personal bias against most current Java runtimes. So I decided to use the cool new API and write a video upload client, and will walk you through what it does in this blog entry. Feel free to "just" use the tool, but hopefully this will also help you in writing your own API clients.

First of all, I wanted to write something that's usable just about everywhere. I tend to use Python, so that's what the tool is written in. The Python standard library didn't seem to be able to do HTTP POST file upload (think web forms) of large files, so I ended up using curl for that. This should work on any Linux/OS X/etc box with Python and curl installed. All you Ubuntu/Debian people just get to say sudo apt-get install curl and that's it.

So, let's dive right in. The tool is imaginatively named revver-upload-video. The first bit is the command line parser. Don't be intimidated by the length, this is pretty much boilerplate code, the actual API-using bits are really small. The full file is 155 lines, total.

There are basically three kinds of options: mandatory, optional and for developer use. Mandatory options are enforced later on, and developer options are mostly meant for playing with the staging environment and reusing upload tokens from previous, failed, uploads.

Guerrilla command line video upload tool.
import optparse, getpass, urlparse, urllib, xmlrpclib, subprocess

def getParser():
	parser = optparse.OptionParser(
		usage='%prog --title=TEXT --age-rating=NUM [OPTIONS] FILE..',
		description='Upload videos to Revver')

					  help='login name to use (default: %s)' %
					  help='read passphrase from (prompt if not given)',
					  help='MPAA age rating (mandatory)',
					  help='title for the video (mandatory)',
					  help='tags (mandatory, repeat for more tags)',

					  help='author of the video',
					  help='website for extra info')
					  help='extra credits',
					  help='a brief description',

					  help='API URL to contact (developers only)')
					  help='Upload URL to send the file to (developers only)')
					  help='use preallocated token (developers only)',

	return parser

If you've used optparse before, there's not much interesting there. It just instantiates a parser object and returns it, for the main function to use. Nothing there touches the Revver API yet.

Next up, we have some utility functions. getPassphrase will read a passphrase form the file given to --passphrase-file=, or prompt the user for one. getAPI instantiates an XML-RPC client object with the login and passphrase, and caches it in options so if you call getAPI more than once, you're still only prompted for the passphrase at most once. Nothing in revver-upload-video uses that, but these are meant to be reusable functions.

def getPassphrase(filename=None):
	if filename is not None:
		f = file(filename)
		passphrase = f.readline().rstrip('\n')
		passphrase = getpass.getpass('Passphrase for video upload: ')

	return passphrase

def getAPI(options):
	api = getattr(options, 'api', None)
	if api is None:
		passphrase = getPassphrase(filename=options.passphrase_file)

		(scheme, netloc, path, query, fragment) = \
		query = urllib.urlencode([('login', options.login),
								  ('passwd', passphrase)])
		url = urlparse.urlunsplit((scheme, netloc, path, query, fragment))
		api = xmlrpclib.Server(url)
		options.api = api
	return api

All right, now we're getting to the actual meat. getToken calls the API method video.getUploadTokens to allocate an upload token, that lets you upload a file to the Revver archive.

def getToken(api):
	url, tokens =
	assert len(tokens)==1
	token = tokens[0]
	return url, token

createMedia creates a new video in the archive from your uploaded file by calling video.create in the API. It also adds metadata like your website URL to the video. Finally, it returns the media id of the newly-created video.

def createMedia(options, token):
	data = {}
	if options.credits is not None:
		data['credits'] = options.credits
	if options.url is not None:
		data['url'] = options.url
	if options.description is not None:
		data['description'] = options.description
	if is not None:
		data['author'] =
	api = getAPI(options)
	media_id =,
	return media_id

Finally, we have the main function, and the bits that call it when you run the tool. Here, we actually parse the command line arguments, enforce the presence of the mandatory options, and bail out unless you gave it actual files to upload.

For each file given on the command line, we either use one of the tokens given to us with --upload-token=, or get one from the API with getToken. Then we join the upload URL and the token to get the place to upload the file to, and run curl as a subprocess to do the actual upload. Checking that curl worked takes 8 lines, and then we use createMedia to actually create the video. And that's it!

def main(progname, args):
	parser = getParser()
	(options, args) = parser.parse_args()

	if options.login is None:
		parser.error('You must pass --login=LOGIN')
	if options.title is None:
		parser.error('You must pass --title=TEXT')
	if not options.tag:
		parser.error('You must pass --tag=KEYWORD')
	if options.age_rating is None:
		parser.error('You must pass --age-rating=NUM')
	if not args:
		parser.error('Pass files to upload on command line')

	for filename in args:
		if options.upload_token:
			token = options.upload_token.pop(0)
			url = options.upload_url
			api = getAPI(options)
			url, token = getToken(api)
			print '%s: allocated token %s' % (progname, token)

		upload_url = urlparse.urljoin(url, token)
		retcode =['curl',
								   '-F', 'file=@%s' % filename,
		if retcode < 0:
			print >>sys.stderr, '%s: upload aborted by signal %d' % (
				progname, -retcode)
		elif retcode > 0:
			print >>sys.stderr, '%s: upload failed with code %d' % (
				progname, retcode)
		print '%s: used token %s for %s' % (progname, token, filename)

		media_id = createMedia(options, token)
		print '%s: created media %r from %r' % (progname,

if __name__ == '__main__':
	import os, sys

At this point, we have all we need to do the same thing as the web-based upload, or the Java upload client. And you can do something similar, by yourself. This file is copyright Revver, Inc, but licensed under the MIT license -- that means you can use it as a base for writing your software, without any real restrictions. Download the whole thing here: revver-upload-video.

2008-07-20T13:33+03:00, originally published 2006-11-13T20:27-08:00

New domain name

I had a bit of fun thinking up of puntastic DNS domain names. I ended up registering, the old name will soon start redirecting there. Need to set up email too.. Vanity domains are soo much fun.

For the rare non-nerd reading this, EAGAIN is the error code you get when you are doing asynchronous programming with non-blocking sockets and would block next. Err, let's just say "I write async code and it's a neat insider joke", ok?

For the nerds out there, here's a bunch of wild ideas I had while figuring out what domain name to register. Many of them are invalid (too short), and most of the good ones are already taken, but in case you need some inspiration:

  • /
  • (as in Chihuahua, our dogs..)
  • (taken)
  • (as in triple-double-w)
  • (as in clue-by-four)
  • (, ;-)
  • (type like a pirate)

And for the Finns:

  • (cat lovers)
  • (dog lovers)
  • (taken, by non-Finn)

I'm not going to say anything about the Cook Island's subdomain for commercial entities.

  • computer
  • nerd
  • dns
2006-11-13T20:17-08:00, originally published 2006-09-30T20:03+03:00

In case your Xen domU's have networking trouble

If your domUs have networking trouble with TCP, or some other protocol that ends up needing fragmentation such as large ICMP pings, you need to read this.

If it seems TCP handshakes complete, but no data is transferred -- especially, no actual data gets sent out from the domU -- you're likely hitting a bug in how Xen interacts with TCP segmentation offload.

The bug seems to depend on the actual network interface card the traffic is going out from. I hear tg3 is one of the cards that triggers it, and I'm seeing it on my home box with 8139too's.

The fix is pretty simple, but hard to figure unless you know what to look for: inside the domU, run

ethtool -K eth0 tx off

for each interface affected.

See for the very small amount of extra information that is out there.

  • xen
  • networking
  • computer

iBook--, Thinkpad++

Life sucks and then your computer breaks. As soon as a new X release is out, and it seems dual head on iBook is a possibility, the darn thing decides to fry its logic board. Again. Thankfully, Apple may make the repair for free, if the symptoms match the manufacturing problem. Again.

Well, the good news is that after 3 years of using the iBook, I got a new laptop. A Lenovo Thinkpad x60s, weighing just 1.3kg. It's so light I always think I forgot to put it in the backpack.

Ubuntu Dapper (flight 7) seems to work pretty well on the x60s. Trouble spots so far:

  • install CD corrupted display during X autoconfig: screen was in text mode, mostly black, with two or three character-size grey rectangles -- hitting enter blindly let it continue and reboot the machine
  • suspend and hibernation fail on resume
  • wlan hanged once, and didn't recover until I rebooted into Windows -- unngh. Look at this:

    ipw3945: Error sending SCAN_ABORT_CMD: time out after 500ms.
    ipw3945: Radio Frequency Kill Switch is On:
    Kill switch must be turned off for wireless networking to work.
    ipw3945: Error sending ADD_STA: time out after 500ms.
    ipw3945: Error sending RATE_SCALE: time out after 500ms.

    After that, any attempt to use the interface ended with:

    ADDRCONF(NETDEV_UP): eth1: link is not ready
  • hotplugging the UltraBay docking station does not seem to work in Linux

I especially love the dual headedness, after fighting with the ATI driver in the iBook.

Now I need to see about hooking the fingerprint reader up to PAM.

  • computer
  • hardware
  • ubuntu
2006-05-22T07:36+03:00, originally published 2006-05-21T22:41+03:00

My iBook has two heads

Finally, after two years of hacks, my iBook 2.2 knows how to multihead! And no silly clone mode only, totally different image and external output at 1600x1200 at 85Hz. This is nice! Thank you people for version 7, thank you X Strike Force!

Update: well, now suspending fails and booting the machine results in a black screen in over half of the tries. Bah.

  • ibook
  • computer
  • hardware
2006-05-02T22:04+03:00, originally published 2006-04-23T18:50+03:00

render_pattern: Repeat patterns easily in Nevow templates

After render_fragment, dialtone mentioned render_pattern, that would get one or many patterns from the page and put them in the current tag. Well, that's easy to write:

def render_pattern(self, name):
   Find and render a pattern.


   <span nevow:pattern="foo">
	 I'm very repetititive.
	 <li nevow:render="pattern foo">
	   this text will get removed when rendering
	 <li nevow:render="pattern foo"/>
   def f(ctx, data):
	   doc = self.docFactory.load(ctx)
   patterns = inevow.IQ(doc).allPatterns(name)
   return ctx.tag.clear()[patterns]
   return f

Updated to adapt doc to inevow.IQ before calling allPatterns.

  • nevow
  • twisted
  • python
  • programming
2005-12-27T11:48+02:00, originally published 2005-12-21T19:18+02:00

render_fragment: Reusable fragment embedding in Nevow templates

This Nevow renderer came up on #twisted.web. Thanks to rwall and dialtone for input.

def render_fragment(self, name):
   Find and render a fragment, with optional docFactory.

   Find a fragment factory from self via attributes named
   fragment_* and replace content of current tag with said

   If pattern docFactory is found under this tag, pass it as
   docFactory to the fragment factory.


   class MyFrag(rend.Fragment):

   class MyPage(rend.Page):
	   fragment_foo = MyFrag

   and give MyPage a template with

   <!-- no docFactory -->
   <div nevow:render="fragment foo">
	 this text will get removed when rendering

   <!-- with docFactory -->
   <div nevow:render="fragment foo">
	 this text will get removed when rendering
 <span nevow:pattern="docFactory">
   but this whole tag will be passed as docFactory to MyFrag.
   def f(ctx, data):
	   callable = getattr(self, 'fragment_%s' % name, None)
   if callable is None:
		   callable = lambda ctx, *args: ctx.tag[
			"The fragment named '%s' was not found in %r." % (name, self)]
	   kwargs = {}
	docFactory = ctx.tag.onePattern('docFactory')
   except stan.NodeNotFound:
		kwargs['docFactory'] = loaders.stan(docFactory)
   return ctx.tag.clear()[callable(**kwargs)]
   return f
  • nevow
  • twisted
  • python
  • programming
2005-12-21T22:45+02:00, originally published 2005-12-21T18:40+02:00

render_if: Conditional Parts in Nevow Templates

This Nevow renderer has saved me a lot of time:

def render_if(self, ctx, data):
   return ctx.tag.clear()[r]

Use it like this:

<nevow:invisible nevow:render="if" nevow:data="items">
  <ul nevow:pattern="True"
	<li nevow:pattern="header">The items are a-coming!</li>
	<li nevow:pattern="item">(the items will be here)</li>

And now, if the list returned by data_items is empty, there will be no <ul> tag at all in the output.

I just realized non-boolean tests may be wanted -- for example, test if a string matches a regexp. You could do that by mangling the data before render_if, but that's not nice, because then you don't have access to the original data inside nevow:pattern="True". So, instead let's parametrize the test:

def render_ifparam(self, name):
   tester = getattr(self, 'tester_%s' % name, None)

   if tester is None:
	   callable = lambda context, data: context.tag[
			"The tester named '%s' was not found in %r." % (name, self)]
	   return callable

   def f(ctx, data):
   return ctx.tag.clear()[r]

   return f

Note how we still cast the return value of the tester to boolean. You could avoid that and call the renderer render_switch. Adding support for Deferred tests would be quite easy, too. The only ugly part is I don't know of any way to make the same renderer work nicely for nevow:render="if" and nevow:render="ifparam foo".

[Updated to add return f, also renamed second render_if to render_ifparam to clarify things a bit. Thanks k3mper.]

  • nevow
  • twisted
  • python
  • programming
2006-03-22T22:52+02:00, originally published 2005-12-17T16:59+02:00

turku-dev: Kehittäjätapaaminen Turussa

This entry is about a local software developer gathering, and written in Finnish. My apologies if it is complete gibberish to you, but atleast you can stare at the pretty ä dots.


Vapaamuotoinen tapaaminen ohjelmistoja työkseen ja/tai harrastuksekseen tekeville, tai muuten aiheesta kiinnostuneille.

Tutustutaan ihmisiin, puhutaan mukavia, syödään ruokaa. Jos haluat kertoa hienosta uudesta softasta, jota olet juuri tekemässä, löydät varmaan jonkun samanmielisen. Jos tarvitset apua hankalaan ongelmaan, joku varmaan on joskus tehnyt jotain samankaltaista. Eikä ihmisten tunteminen ainakaan haittaa urakehitystäkään.

Sillä ei ole väliä onko työkalusi C, Perl, Java, Python, Ruby vai PHP; tai Linux, BSD, OS X vai jopa Windows. Vapaat/avoimet ohjelmistot ovat monelle meistä tärkeitä, joten niiltä et kokonaan pysty välttymään, mutta tarkoitus on vain saada samanhenkisiä ihmisiä kokoon.


Turun keskustassa oleva ravintola Harald, katso kartta.

Meitä kiinnostaa eniten Turun seudun toiminta, mutta ajatuksia "road showsta" on heitetty ilmaan, eli jatkossa kehittäjätapaaminen saattaa olla sinunkin lähikuppilassasi.


Nyt lauantaina, 26.11. n. klo 12:30 alkaen. Niin pitkään kun intoa riittää.

Seuraava kerta varmaan joskus tammikuussa, ja siitä sitten eteenpäin vaikka kuukauden tai parin välein.


Tällä tapahtumalla ei ole virallista järjestäjää, eikä se liity minkään yhdistyksen tjms. toimintaan. Minä aloin asiasta tutuille puhumaan, Tero Kuusela on tehnyt lähes kaiken valmistelutyön.

Tällä hetkellä aiheesta kiinnostuneiden ihmisten taustoja ja kiinnostuksia: VSTKY, Linux-Aktivaattori, Debian, Python, Linux kernel, Google Summer of Code, jne..


Liity postituslistalle. Listan osoite on ja liittyminen tapahtuu lähettämällä viesti osoitteeseen ja vastaamalla vahvistus-pyyntöön.

Ihmismäärän arvioimiseksi pyydän, että ilmoitat tulostasi etukäteen osoitteeseen Tero Kuusela

  • turku-dev
  • finland
  • software-development
  • meeting
  • lang:fi
2005-11-25T15:33+02:00, originally published 2005-11-22T15:48+02:00

The MochiKit screencast is very nice

"It's simply a more convenient syntax."

"MochiKit is full of more convenient syntax."

The MochiKit screencast is great. I think screencasts are a great way to introduce people to new software.

  • javascript
  • ajax
  • twisted
  • python

New website template

Just finished a new website layout. I'm reasonably pleased with it.

  • web

Python is confusing

>>> def simple(): yield 'a'
>>> ', '.join(simple())
>>> def horrible():
...     if ' ' not in False: yield 'a'
>>> ', '.join(horrible())
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: sequence expected, generator found
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: sequence expected, generator found

But it does accept generators!

(Yes, I know what triggers it to say that. It's still horribly misleading.)

  • python
  • programming

Using nevow.guard the smart way

<ronwalf> ok, I give up... How do I get the AVATAR_LOGIN stuck
between the SessionWrapped resource ul and the current
resource url

Well, we aim to please.

def getActionURL(ctx):
    request = inevow.IRequest(ctx)
    current = url.URL.fromRequest(request).clear()
    root = request.getRootURL()
    root = url.URL.fromString(root)
    assert root is not None
    root = root.pathList()
    me = current.pathList(copy=True)
    diff = len(me) - len(root)
    assert diff >= 0
    action = current
    if diff == 1:
        action = action.curdir()
        while diff > 1:
            diff -= 1
            action = action.parent()
    action = action.child(guard.LOGIN_AVATAR)
    for element in me[len(root):]:
        action = action.child(element)
    return action

Comment from ronwalf (on IRC) on 2005-09-17T00:06:57:

<ronwalf> Better.  after root = root.pathList()
<ronwalf> if root == ['']: root = []
  • python
  • twisted
  • programming
2005-09-17T00:06:57Z, originally published 2005-09-16T22:59+02:00