Better CoreOS in a VM experience
(This was all tested with CoreOS beta 991.2.0
, the stable 835.13.0
fails to mount 9p
volumes.)
CoreOS comes with instructions how to run it in QEMU . After the setup, it comes down to something like
#!/bin/sh
exec ./coreos_production_qemu.sh \
-user-data cloud-config.yaml \
-nographic \
"$@"
with cloud-config.yaml
looking like
#cloud-config
hostname: mytest
users:
- name: jdoe
groups:
- sudo
- rkt
ssh_authorized_keys:
- "ssh-ed25519 blahblah [email protected]"
I used that for many experiments, but felt it was less than ideal.
I wanted to move beyond -net user
. That just means calling QEMU
directly, instead of using coreos_production_qemu.sh
, no big deal.
But it meant I would be writing my own QEMU runner, no matter what.
I also wanted more efficient usage of my resources -- after all, the whole point of me running several virtual machines on one physical machine is to make these test setups more economical.
The provided QEMU images waste disk and bandwidth. Every VM stores
two copies of the CoreOS /usr
image, just like a physical machine
would. Copy-on-write trickery on the initial image will not help
beyond the first auto-update, as each VM independently downloads and
applies the updates. This means if you run a small test cluster of say
5 VMs, you'll end up with 10 copies of CoreOS, and 5x the bandwidth
usage needed.
Imitating physical computers with virtual machines is great if you're trying to learn how the CoreOS update mechanism works, but once you're to the point of wanting to just run services, it's simply not needed.
CoreOS does have a supported mode where it does not use the USR-A
and USR-B
partitions:
PXE booting
,
starting a computer by requesting the software over the network. I
could even skip the virtual networking and use this
with QEMU by launching the kernel and initrd directly
,
no need for PXE itself. However, this is wasteful in another way: it
holds the complete /usr
partition contents in RAM, using about
180MB. Once per each VM. There is also an annoying delay of 15+
seconds in VM startup, presumably related to the large initrd image,
and later the kernel spends 1.2 seconds uncompressing it into a
tmpfs
(measured on a i5-5300U laptop).
Digging into the PXE image, I find that it actually stores the /usr
contents as a squashfs
-- which is a real filesystem that can be
stored on block devices, as opposed to just unpacking a cpio
to a
tmpfs
. The PXE image does what's called a "loopback mount", where a
file is treated like a block device. In the PXE scenario, the file is
held in RAM in a tmpfs
; I can just put those bytes on a block
device, and boot that!
(The
Live CD
seems to also hold /usr
contents in tmpfs
just like the PXE
variant, even though it could fetch them on demand from the ISO. The
squashfs
image is random-access, unlike the usual cpio.gz
that's
used for initramfs
contents. In later versions, CoreOS could switch
their ISO images to use the trick I'll explain below -- at the cost of
physical machines needing to spin up a CD more often than once per
boot. The live CD has another downside that made me avoid it: to pass
kernel parameters, I'd have to resort to kludges like creating a boot
floppy image with syslinux
and the right parameters on it.)
So, I set about fixing the wasted disk and bandwidth problem. Here's a story of an afternoon project.
Using the /usr
image directly
Instead of holding an extra copy of the /usr
image data in RAM, we
can make it available as a block device, and load blocks on demand.
For that, we need the /usr
squashfs
image as a standalone file,
not inside the cpio
. It's not available as a separate download, but
we can extract it from the PXE image:
wget http://beta.release.core-os.net/amd64-usr/current/coreos_production_pxe.vmlinuz
wget http://beta.release.core-os.net/amd64-usr/current/coreos_production_pxe.vmlinuz.sig
gpg --verify coreos_production_pxe.vmlinuz.sig
wget http://beta.release.core-os.net/amd64-usr/current/coreos_production_pxe_image.cpio.gz
wget http://beta.release.core-os.net/amd64-usr/current/coreos_production_pxe_image.cpio.gz.sig
gpg --verify coreos_production_pxe_image.cpio.gz.sig
zcat coreos_production_pxe_image.cpio.gz \
| cpio -i --quiet --sparse --to-stdout usr.squashfs \
>usr.squashfs
Prepare a root filesystem
We also need to make prepare a disk image that will be used for
storing the root filesystem. CoreOS won't boot right with a fully
blank disk. If it had, I would have used qcow2
as the format, but
now I need to provide some sort of structure for the root filesystem,
so let's go with a raw
disk image.
I might have been able to set up the right GPT partition UUIDs for the
initrd to mkfs
things for me, but that seemed too complicated, and I
doubted it'd support my "just the root" scenario as well as their
nine-partition layout.
To keep it simple, we won't bother to use partitions; the whole block device is just one filesystem.
>rootfs.img
chattr +C rootfs.img
truncate -s 4G rootfs.img
mkfs.ext4 rootfs.img
Prepare user_data
This was previously done inside coreos_production_qemu.sh
with a
temp dir, but we'll just pass a directory as virtfs
following the
"config drive" convention. Let's move our previous file into the right
place:
mkdir -p config/openstack/latest
mv cloud-config.yaml config/openstack/latest/user_data
Finally, run QEMU
qemu-system-x86_64 \
-name mycoreosvm \
-nographic \
-machine accel=kvm -cpu host -smp 4 \
-m 1024 \
\
-net nic,vlan=0,model=virtio \
-net user,vlan=0,hostfwd=tcp::2222-:22,hostname=mycoreosvm \
\
-fsdev local,id=config,security_model=none,readonly,path=config \
-device virtio-9p-pci,fsdev=config,mount_tag=config-2 \
\
-drive if=virtio,file=usr.squashfs,format=raw,serial=usr.readonly \
-drive if=virtio,file=rootfs.img,format=raw,discard=on,serial=rootfs \
\
-kernel coreos_production_pxe.vmlinuz \
-append 'mount.usr=/dev/disk/by-id/virtio-usr.readonly mount.usrflags=ro root=/dev/disk/by-id/virtio-rootfs rootflags=rw console=tty0 console=ttyS0 coreos.autologin'
You'll be greeted with the Linux bootstrap messages and finally
This is mycoreosvm (Linux x86_64 4.4.6-coreos) 06:14:10
SSH host key: SHA256:t+WkofIWxkARu1hezwPnS/vgTJXUcPidA3UxKr+1uGA (DSA)
SSH host key: SHA256:cT32H33EVCHSnrCRsB+I9GG7AgXQWfyjk7JFuEzAqFU (ECDSA)
SSH host key: SHA256:NFgc7BLbeyS3SslpscSSNHNzc7lXzx6vKqBmUp+5T7Q (ED25519)
SSH host key: SHA256:pK8Dknoib61FnIwMQ6u4F4FxeSMIRq9zYsrJd0N3MPY (RSA)
eth0: 10.0.2.15 fe80::5054:ff:fe12:3456
mycoreosvm login: core (automatic login)
CoreOS stable (991.2.0)
Last login: Fri Apr 1 06:02:25 +0000 2016 on /dev/tty1.
Update Strategy: No Reboots
[email protected] ~ $
Success!
As usual with QEMU, press C-a x
to exit.
Stay tuned for part 2, where we will make the VM even leaner.