Better CoreOS in a VM experience

(This was all tested with CoreOS beta 991.2.0, the stable 835.13.0 fails to mount 9p volumes.)

CoreOS comes with instructions how to run it in QEMU . After the setup, it comes down to something like

#!/bin/sh
exec ./coreos_production_qemu.sh \
    -user-data cloud-config.yaml \
    -nographic \
    "$@"

with cloud-config.yaml looking like

#cloud-config

hostname: mytest
users:
  - name: jdoe
    groups:
      - sudo
      - rkt
    ssh_authorized_keys:
      - "ssh-ed25519 blahblah [email protected]"

I used that for many experiments, but felt it was less than ideal.

I wanted to move beyond -net user. That just means calling QEMU directly, instead of using coreos_production_qemu.sh, no big deal. But it meant I would be writing my own QEMU runner, no matter what.

I also wanted more efficient usage of my resources -- after all, the whole point of me running several virtual machines on one physical machine is to make these test setups more economical.

The provided QEMU images waste disk and bandwidth. Every VM stores two copies of the CoreOS /usr image, just like a physical machine would. Copy-on-write trickery on the initial image will not help beyond the first auto-update, as each VM independently downloads and applies the updates. This means if you run a small test cluster of say 5 VMs, you'll end up with 10 copies of CoreOS, and 5x the bandwidth usage needed.

Imitating physical computers with virtual machines is great if you're trying to learn how the CoreOS update mechanism works, but once you're to the point of wanting to just run services, it's simply not needed.

CoreOS does have a supported mode where it does not use the USR-A and USR-B partitions: PXE booting , starting a computer by requesting the software over the network. I could even skip the virtual networking and use this with QEMU by launching the kernel and initrd directly , no need for PXE itself. However, this is wasteful in another way: it holds the complete /usr partition contents in RAM, using about 180MB. Once per each VM. There is also an annoying delay of 15+ seconds in VM startup, presumably related to the large initrd image, and later the kernel spends 1.2 seconds uncompressing it into a tmpfs (measured on a i5-5300U laptop).

Digging into the PXE image, I find that it actually stores the /usr contents as a squashfs -- which is a real filesystem that can be stored on block devices, as opposed to just unpacking a cpio to a tmpfs. The PXE image does what's called a "loopback mount", where a file is treated like a block device. In the PXE scenario, the file is held in RAM in a tmpfs; I can just put those bytes on a block device, and boot that!

(The Live CD seems to also hold /usr contents in tmpfs just like the PXE variant, even though it could fetch them on demand from the ISO. The squashfs image is random-access, unlike the usual cpio.gz that's used for initramfs contents. In later versions, CoreOS could switch their ISO images to use the trick I'll explain below -- at the cost of physical machines needing to spin up a CD more often than once per boot. The live CD has another downside that made me avoid it: to pass kernel parameters, I'd have to resort to kludges like creating a boot floppy image with syslinux and the right parameters on it.)

So, I set about fixing the wasted disk and bandwidth problem. Here's a story of an afternoon project.

Using the /usr image directly

Instead of holding an extra copy of the /usr image data in RAM, we can make it available as a block device, and load blocks on demand.

For that, we need the /usr squashfs image as a standalone file, not inside the cpio. It's not available as a separate download, but we can extract it from the PXE image:

wget http://beta.release.core-os.net/amd64-usr/current/coreos_production_pxe.vmlinuz
wget http://beta.release.core-os.net/amd64-usr/current/coreos_production_pxe.vmlinuz.sig
gpg --verify coreos_production_pxe.vmlinuz.sig
wget http://beta.release.core-os.net/amd64-usr/current/coreos_production_pxe_image.cpio.gz
wget http://beta.release.core-os.net/amd64-usr/current/coreos_production_pxe_image.cpio.gz.sig
gpg --verify coreos_production_pxe_image.cpio.gz.sig

zcat coreos_production_pxe_image.cpio.gz \
    | cpio -i --quiet --sparse --to-stdout usr.squashfs \
    >usr.squashfs

Prepare a root filesystem

We also need to make prepare a disk image that will be used for storing the root filesystem. CoreOS won't boot right with a fully blank disk. If it had, I would have used qcow2 as the format, but now I need to provide some sort of structure for the root filesystem, so let's go with a raw disk image.

I might have been able to set up the right GPT partition UUIDs for the initrd to mkfs things for me, but that seemed too complicated, and I doubted it'd support my "just the root" scenario as well as their nine-partition layout.

To keep it simple, we won't bother to use partitions; the whole block device is just one filesystem.

>rootfs.img
chattr +C rootfs.img
truncate -s 4G rootfs.img
mkfs.ext4 rootfs.img

Prepare user_data

This was previously done inside coreos_production_qemu.sh with a temp dir, but we'll just pass a directory as virtfs following the "config drive" convention. Let's move our previous file into the right place:

mkdir -p config/openstack/latest
mv cloud-config.yaml config/openstack/latest/user_data

Finally, run QEMU

qemu-system-x86_64 \
    -name mycoreosvm \
    -nographic \
    -machine accel=kvm -cpu host -smp 4 \
    -m 1024 \
    \
    -net nic,vlan=0,model=virtio \
    -net user,vlan=0,hostfwd=tcp::2222-:22,hostname=mycoreosvm \
    \
    -fsdev local,id=config,security_model=none,readonly,path=config \
    -device virtio-9p-pci,fsdev=config,mount_tag=config-2 \
    \
    -drive if=virtio,file=usr.squashfs,format=raw,serial=usr.readonly \
    -drive if=virtio,file=rootfs.img,format=raw,discard=on,serial=rootfs \
    \
    -kernel coreos_production_pxe.vmlinuz \
    -append 'mount.usr=/dev/disk/by-id/virtio-usr.readonly mount.usrflags=ro root=/dev/disk/by-id/virtio-rootfs rootflags=rw console=tty0 console=ttyS0 coreos.autologin'

You'll be greeted with the Linux bootstrap messages and finally

This is mycoreosvm (Linux x86_64 4.4.6-coreos) 06:14:10
SSH host key: SHA256:t+WkofIWxkARu1hezwPnS/vgTJXUcPidA3UxKr+1uGA (DSA)
SSH host key: SHA256:cT32H33EVCHSnrCRsB+I9GG7AgXQWfyjk7JFuEzAqFU (ECDSA)
SSH host key: SHA256:NFgc7BLbeyS3SslpscSSNHNzc7lXzx6vKqBmUp+5T7Q (ED25519)
SSH host key: SHA256:pK8Dknoib61FnIwMQ6u4F4FxeSMIRq9zYsrJd0N3MPY (RSA)
eth0: 10.0.2.15 fe80::5054:ff:fe12:3456

mycoreosvm login: core (automatic login)

CoreOS stable (991.2.0)
Last login: Fri Apr  1 06:02:25 +0000 2016 on /dev/tty1.
Update Strategy: No Reboots
core@mycoreosvm ~ $

Success!

As usual with QEMU, press C-a x to exit.

Stay tuned for part 2, where we will make the VM even leaner.

2020-01-21T20:49:33-07:00, originally published 2016-03-31T16:37:27-08:00