Deploying Raspberry Pi as LAVA dispatcher for USB-attached DUTs
December 2, 2021•4,129 words
Today I will talk about using a Raspberry Pi 4B (8GB) with an SSD disk as infrastructure for deploying LAVA dispatcher used for testing USB-attached devices such as The 96Boards Nitrogen board.
Introduction
This somewhat lengthy post goes into the practical details of setting up a minature infrastructure and test stack on a single board or a cluster of identical boards. It is separated into two main parts - the physical part with the bare-metal OS (infrastructure) and the software-defined services part built on top (test/payload). You can use it as a story, as a tutorial or as a quick google search result for a particular problem (perhaps).
I wanted to approach this problem as an infrastructure problem. With separate layer for managing the hardware and the base OS and another layer for whatever is needed by the testing stack. This split seems to be natural in environments with separate testing team and infrastructure team, where their goals differ.
Infrastructure layer
At the infrastructure layer we're using Raspberry Pi 4B with 8GB of RAM, up-to-date EEPROM bootloader configured to boot from USB, a USB-SATA adapter with a low-cost 128GB SATA SSD. Those are significantly faster and more robust than micro SD cards.
Ubuntu 20.04 LTS + cloud-init
For the management side operating system I've been using Ubuntu 20.04 LTS. This version is, at the time of this writing, the latest long-term-support release. In the future I plan to upgrade to 22.04 LTS as that may cut one step required from the setup process.
The setup process involves two stages. Preparing the hardware itself (assembling everything, updating the boot firmware using raspbian, wiring everything together) and software setup (putting some initial image on the SSD).
Start with ubuntu-20.04.3-preinstalled-server-arm64+raspi.img.xz
. You can copy it to your SSD with dd
, for example, assuming that your SSD is at /dev/sdb
and the downloaded image is the current directory: xzcat ubuntu-20.04.3-preinstalled-server-arm64+raspi.img.xz | sudo dd of=/dev/sdb conv=sparse bs=4M
. This should take just a moment, since we use conv=sparse
to detect and skip writing all-zero blocks.
We don't want to use the default ubuntu
user. Instead we want some admin accounts, ssh keys and the like. We can achieve that with cloud-init
by preparing a user-data
file. Here's an edited (abbreviated) file I've used:
#cloud-config
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Huawei Inc.
users:
- name: zyga
gecos: Zygmunt Krynicki
primary_group: zyga
groups: users, adm, sudo, lxd
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
ssh_authorized_keys:
- "ssh-rsa (edited-out) zyga@hostname"
chpasswd:
list: [zyga:password]
expire: true
hostname: pi4-1
ntp:
enabled: true
timezone: Europe/Warsaw
packages:
- ssh
package_update: true
package_upgrade: true
package_reboot_if_required: true
This file is extremely basic and has some short-cuts applied. The real file I've used registered the system with our management and our VPN systems. This did bring additional complexity I will talk about later. For the moment you can see that it gives me a single user account, with authorized key and a fixed password that has to be changed on 1st boot. There's also a fixed hostname, ntp and time zone configuration, initial set of packages to install and a request to update everything on first boot. Ssh is important since we want to be able to log in and administer the device remotely.
Normally I would not set the password at all but during development it is very useful to be able to login interactively from the console or over the serial port. The cloud-init
user-data
file can be provided in one of several ways, but one that's low-cost and easy to do is to copy the file to the root directory of the boot partition (which is the first partition in the image). We'll do that shortly but first... some caveats.
In a perfect world that would be all that we need. I sincerely hope so for when Ubuntu 22.04 ships, that will be true. With no more u-boot and a more recent kernel, improved systemd and cloud-init. For the current Ubuntu 20.04 LTS, there are a few extra steps to cover.
Caveat: mass storage mode vs UFS mode
This doesn't apply to all the USB-SATA adapters but at least with Linux 5.4 and with the USB-SATA adapter using vendor:product 152d:0578
I had to add a quirk to force the device to operate in mass-storage mode. UFS mode was buggy and the device would never boot.
We can do that by editing the default kernel command line. The boot firmware loads it from cmdline.txt
(by default). All I had to do was to append usb-storage.quirks=152d:0578:u
to the parameter list there.
Caveat: USB boot support
Ubuntu 20.04 uses the following boot chain:
- System firmware from EEPROM looks for boot media (configurable) and picks SD card (default).
- The card is searched for a FAT partition with several files, notably
start.elf
,fixup.dat
andconfig.txt
(and several others, those don't matter to us). - The
config.txt
file instructs the firmware to load the kernel from a file specific to the version of the board used, here it would have beenuboot_rpi_4.bin
. - That in turn loads
boot.scr
and then picks upvmlinuz
, de-compresses it in memory, loadsinitrd.mg
and the system starts runing Linux.
There's only one problem. The version of u-boot used here doesn't support the USB chip used on the board, so we cannot boot from our SSD. Ooops!
Fortunately, the current boot firmware (start.elf
and earlier a copy of bootcode.bin
which is read from on-board EEPROM) is capable of doing that directly. Moreover, it also supports compressed kernels, another limitation that is now lifted.
In general the firmware loads config.txt
. On Ubuntu 20.04 that file is set to load two include files (this is processed by start.elf
) - those are syscfg.txt
and usercfg.txt
. I've used the first one to tell the boot firmware to ignore u-boot and load the kernel and initrd directly. Since start.elf
has already been loaded from USB mass storage, we have no problems loading all the other files as well. We can use our USB-SATA adapter just fine.
Here's the syscfg.txt
I've made. I left the comments so that you can see what various parts are for. You can remove everything but the last three lines, where we instruct the bootloader to load the kernel from vmlinuz
. This overrides an earlier definition, in config.txt
, which sets kernel=
to one of the u-boot images.
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Huawei Inc.
dtparam=audio=on
dtparam=i2c_arm=on
dtparam=spi=on
# Enable UART for debugging.
enable_uart=1
# Use debug version of the boot firmware.
start_debug=1
# Enable UART in the 2nd stage bootloader (start.elf).
# This is useful for diagnosing boot problems.
uart_2ndstage=1
# Move bluetooth to slower UART to assist in early boot debugging.
dtoverlay=miniuart-bt
# Ask the firmware to only poll the SD card once. This avoids a bug where udev
# is getting stuck when booting from USB and the SD card is not populated.
dtparam=sd_poll_once
# Load the kernel directly, without looping via u-boot. This is what allows us
# to boot from USB. It is not needed for u-boot assisted SD card boots.
kernel=vmlinuz
initramfs initrd.img followkernel
cmdline=cmdline.txt
In my setup I've used a Makefile
to program and alter the image. As you can see I'm copying user-data
, cmdline.txt
and syscfg.txt
all in one step. The makefile rule below is edited for brevity.
flash-%: MEDIA_DEVICE ?= $(error set MEDIA_DEVICE= to the path of the block device)
flash-%: BOOT_PART ?= 1
flash-%: IMAGE_XZ ?= ubuntu-20.04.3-preinstalled-server-arm64+raspi.img.xz
flash-%: D := $(shell mktemp -d)flash-%: %/user-data %/cmdline.txt %/syscfg.txt
test `id -u` -eq 0
xzcat $(IMAGE_XZ) | dd of=$(MEDIA_DEVICE) conv=sparse bs=4M
unshare -m /bin/sh -c "mount $(MEDIA_DEVICE)$(BOOT_PART) $(D) && cp -v $^ $(D)"
Caveat: initial system time
Raspberry Pi 4 is popular and fun, but like all the boards in its lineage, does not come with a battery-powered real time clock. When you boot the system, it's 1970 all over again. Parts of the OS have been tuned to do better than that, by loading timestamps from various places, like a built-in timestamp in systemd (system time must not be earlier than systemd build time) or support packages that store and restore the time at shutdown and startup. Nothing here can make a device left in cold storage for a while and booted while being disconnected from the network will know what time it is. It just doesn't have the hardware to do that.
This makes first boot annoyingly complicated. We want to go the the network and fetch packages. That implies verifying certificates and their associated validity window. I didn't debug this deeply but at least in Ubuntu 20.04 there's no way to synchronize attempt to install the first package (or add a 3rd party repository) with system having obtained initial synchronization from network time servers.
What I've found is that, at least the version of systemd used in Ubuntu 20.04 has one way of setting initial time. Systemd will stat /var/lib/systemd/timesync/clock
and use the mtime as the earliest valid system time. This is paramount for the case of creating an initial disk image now and quickly booting the system with it, as we can make sure that the system will have so-so time early enough when you start to initiate https connections and need to validate certificates.
This is a kludge. Ideally cloud-init
should have a way to force the rest of the commands to wait for NTP time sync, if enabled.
In my helper Makefile
I solved it like this:
flash-%: ROOT_PART ?= 2
flash-%: D := $(shell mktemp -d)
unshare -m /bin/sh -c "mount $(MEDIA_DEVICE)$(ROOT_PART) $(D) && mkdir -v -p $(D)/var/lib/systemd/timesync && touch $(D)/var/lib/systemd/timesync/clock"
The variables don't matter, except for D
which, as you can see, is a temporary directory. All that matters is that we touch
the afforementioned file on the second partition of the image.
First boot
With all of this done and a few goats sacrificed, we can now plug in network and power and boot the system for the first time.
The first boot is critical. Network must be working, stars must align. If things fail here, the device will just sit idle, forever. Nothing will be re-tried. Normally cloud-init is used in the cloud, where network typically works. If something fails in your local environment you have two options:
- Start over and re-write the disk.
- Log in interactively, if you can, to run
cloud-init clean
and reboot.
For debugging you may want to run cloud-init collect-logs
and inspect the resulting tarball on your preferred system. It contains crucial information from first boot. Two things to look out for: DNS issues and the exact moment system time jumps from the fake time baked into the image to the actual local time.
Once the system boots this part is done. Good work. You have a solid enough infrastructure to provision the next level of the stack and let your users in.
Optional extras
Landscape
If you are a landscape user, you may want to extend your user-data
file with a section that looks like this one:
landscape:
client:
url: "https://landscape.canonical.com/message-system"
# NOTE: this cannot use https, don't change it!
ping_url: "http://landscape.canonical.com/ping"
data_path: "/var/lib/landscape/client"
tags: ""
computer_title: "{{ v1.local-hostname }}"
account_name: "your landscape account name"
Note the {{ ... }}
fragment. This is a jinja template. If you use that you have to put one at the top of the user-data
file:
## template: jinja
#cloud-config
Use those exact number of hashes and spaces, or the template magic won't kick-in.
This will set everything up to register your system with Landscape. Unless you use an on-prem installation the URLs are valid and ready to go. You may want to set an account key as well, if you use one.
You should check if landscape-client.service
is enabled. It will work on first boot but after the first reboot, I've seen cases where the service was just inactive. You may want to add systemctl enable landscape-client.service
to your runcmd:
section.
Tailscale
If you want to use tailscale to VPN into your infrascturcture you will want to add the repository and register to it on the first boot. There are some caveats here as well, sadly. First of all, tailscale SSL certifcate is from Let's Encrypt and, at least for the Ubuntu 20.04.3 image I've been using, is not valid until you update your ssl certificates. This means that you cannot just apt-get update as apt will not accept the certificate from tailscale until you update ca-certificates
.
You can use this snippet as a starting point and perhaps manage it via landscape with package profiles and user scripts. Alternatively you can do it in runcmds
directly, if you don't mind runnnig curl | sudo sh
style programs.
apt:
sources:
tailscale.list:
source: deb https://pkgs.tailscale.com/stable/ubuntu focal main
key: |
-----BEGIN PGP PUBLIC KEY BLOCK-----
mQINBF5UmbgBEADAA5mxC8EoWEf53RVdlhQJbNnQW7fctUA5yNcGUbGGGTk6XFqO
nlek0Us0FAl5KVBgcS0Bj+VSwKVI/wx91tnAWI36CHeMyPTawdT4FTcS2jZMHbcN
UMqM1mcGs3wEQmKz795lfy2cQdVktc886aAF8hy1GmZDSs2zcGMvq5KCNPuX3DD5
INPumZqRTjwSwlGptUZrJpKWH4KvuGr5PSy/NzC8uSCuhLbFJc1Q6dQGKlQxwh+q
AF4uQ1+bdy92GHiFsCMi7q43hiBg5J9r55M/skboXkNBlS6kFviP+PADHNZe5Vw0
0ERtD/HzYb3cH5YneZuYXvnJq2/XjaN6OwkQXuqQpusB5fhIyLXE5ZqNlwBzX71S
779tIyjShpPXf1HEVxNO8TdVncx/7Zx/FSdwUJm4PMYQmnwBIyKlYWlV2AGgfxFk
mt2VexyS5s4YA1POuyiwW0iH1Ppp9X14KtOfNimBa0yEzgW3CHTEg55MNZup6k2Q
mRGtRjeqM5cjrq/Ix15hISmgbZogPRkhz/tcalK38WWAR4h3N8eIoPasLr9i9OVe
8aqsyXefCrziaiJczA0kCqhoryUUtceMgvaHl+lIPwyW0XWwj+0q45qzjLvKet+V
Q8oKLT1nMr/whgeSJi99f/jE4sWIbHZ0wwR02ZCikKnS05arl3v+hiBKPQARAQAB
tERUYWlsc2NhbGUgSW5jLiAoUGFja2FnZSByZXBvc2l0b3J5IHNpZ25pbmcga2V5
KSA8aW5mb0B0YWlsc2NhbGUuY29tPokCTgQTAQgAOBYhBCWWqZ6qszghiTwKeUWM
qDKVf1hoBQJeVJm4AhsDBQsJCAcCBhUKCQgLAgQWAgMBAh4BAheAAAoJEEWMqDKV
f1hoWHEP/1DYd9WZrodyV5zy1izvj0FXtUReJi374gDn3cHrG6uYtXcE9HWZhxQD
6nDgYuey5sBhLvPQiE/sl5GYXNw/O95XVk8HS54BHCCYq1GeYkZaiCGLGFBA08JK
7PZItGsfdJHwHfhSMtGPS7Cpmylje9gh8ic56NAhC7c5tGTlD69Y8zGHjnRQC6Hg
wF34jdp8JTQpSctpmiOxOXN+eH8N59zb0k30CUym1Am438AR0PI6RBTnubBH+Xsc
eQhLJnmJ1bM6GP4agXw5T1G/qp95gjIddHXzOkEvrpVfJFCtp91VIlBwycspKYVp
1IKAdPM6CVf/YoDkawwm4y4OcmvNarA5dhWBG0Xqse4v1dlYbiHIFcDzXuMyrHYs
D2Wg8Hx8TD64uBHY0fp24nweCLnaZCckVUsnYjb0A494lgwveswbZeZ6JC5SbDKH
Tc2SE4jq+fsEEJsqsdHIC04d+pMXI95HinJHU1SLBTeKLvEF8Zuk7RTJyaUTjs7h
Ne+xWDmRjjR/D/GXBxNrM9mEq6Jvp/ilYTdWwAyrSmTdotHb+NWjAGpJWj5AZCH9
HeBr2mtVhvTu3KtCQmGpRiR18zMbmemRXUh+IX5hpWGzynhtnSt7vXOvhJdqqc1D
VennRMQZMb09wJjPcvLIApUMl69r29XmyB59NM3UggK/UCJrpYfmuQINBF5UmbgB
EADTSKKyeF3XWDxm3x67MOv1Zm3ocoe5xGDRApPkgqEMA+7/mjVlahNXqA8btmwM
z1BH5+trjOUoohFqhr9FPPLuKaS/pE7BBP38KzeA4KcTiEq5FQ4JzZAIRGyhsAr+
6bxcKV/tZirqOBQFC7bH2UAHH7uIKHDUbBIDFHjnmdIzJ5MBPMgqvSPZvcKWm40g
W+LWMGoSMH1Uxd+BvW74509eezL8p3ts42txVNvWMSKDkpiCRMBhfcf5c+YFXWbu
r5qus2mnVw0hIyYTUdRZIkOcYBalBjewVmGuSIISnUv76vHz133i0zh4JcXHUDqc
yLBUgVWckqci32ahy3jc4MdilPeAnjJQcpJVBtMUNTZ4KM7UxLmOa5hYwvooliFJ
wUFPB+1ZwN8d+Ly12gRKf8qA/iL8M5H4nQrML2dRJ8NKzP2U73Fw+n6S1ngrDX8k
TPhQBq4EDjDyX7SW3Liemj5BCuWJAo53/2cL9P9I5Nu3i2pLJOHzjBSXxWaMMmti
kopArlSMWMdsGgb0xYX+aSV7xW+tefYZJY1AFJ1x2ZgfIc+4zyuXnHYA2jVYLAfF
pApqwwn8JaTJWNhny/OtAss7XV/WuTEOMWXaTO9nyNmHla9KjxlBkDJG9sCcgYMg
aCAnoLRUABCWatxPly9ZlVbIPPzBAr8VN/TEUbceAH0nIwARAQABiQI2BBgBCAAg
FiEEJZapnqqzOCGJPAp5RYyoMpV/WGgFAl5UmbgCGwwACgkQRYyoMpV/WGji9w/8
Di9yLnnudvRnGLXGDDF2DbQUiwlNeJtHPHH4B9kKRKJDH1Rt5426Lw8vAumDpBlR
EeuT6/YQU+LSapWoDzNcmDLzoFP7RSQaB9aL/nJXv+VjlsVH/crpSTTgGDs8qGsL
O3Y2U1Gjo5uMBoOfXwS8o1VWO/5eUwS0KH7hpbOuZcf9U9l1VD2YpGfnMwX1rnre
INJqseQAUL3oyNl76gRzyuyQ4AIA06r40hZDgybH0ADN1JtfVk8z4ofo/GcfoXqm
hifWJa2SwwHeijhdN1T/kG0FZFHs1DBuBYJG3iJ3/bMeL15j1OjncIYIYccdoEUd
uHnp4+ZYj5kND0DFziTvOC4WyPpv3BlBVariPzEnEqnhjx5RYwMabtTXoYJwUkxX
2gAjKqh2tXissChdwDGRNASSDrChHLkQewx+SxT5kDaOhB84ZDnp+urn9A+clLkN
lZMsMQUObaRW68uybSbZSmIWFVM1GovRMgrPG3T6PAykQhFyE/kMFrv5KpPh7jDj
5JwzQkxLkFMcZDdS43VymKEggxqtM6scIRU55i059fLPAVXJG5in1WhMNsmt49lb
KqB6je3plIWOLSPuCJ/kR9xdFp7Qk88GCXEd0+4z/vFn4hoOr85NXFtxhS8k9GfJ
mM/ZfUq7YmHR+Rswe0zrrCwTDdePjGMo9cHpd39jCvc=
=AIVM
-----END PGP PUBLIC KEY BLOCK-----
runcmd:
- tailscale up -authkey your-tailscale-key-here
Midpoint
We have a nice system running but it should be manged as just another server in your fleet. What I've described here is the bare minimum I happen to do. Your site may use other software and stacks to ease remote administration at scale.
Using a Raspberry Pi gives us the advantage of having a low-cost system with various interesting peripherals that often come in handy when doing device testing.
In the next chapter we will look at the next part of the stack, our LAVA dispatcher.
The test/content layer
We now have a nice and mostly vanilla low-cost system. Let's use it for deploying LAVA. After a few iterations with this idea I'm deploying LAVA inside a virtual machine, with USB pass-through offering access to USB-serial adapters and USB-attached 96Boards Nitrogen micro-controller board.
Why like that? Let's break it down:
1) We can destroy and re-provision anything on top of this system without touching the hardware. This is just nice for management, as the hardware may be locked in a lab room somewhere, with controlled access and you may be far away, for instance working from home. This part is non-controversial.
2) We can use the system for other things. In particular it's probably the low-cost aarch64 system you always wanted to try. You can enable it in CI (we'll talk about resource management later) and let your developers compile and test on aarch64 naively.
3) LAVA is tested this way. It likes to have access to a full-system like environment. With a kernel image around and things to use for libguestfs. With udev rules and all the other dirty parts of the plumbing layer that may be needed. This lets your test team set anything up they want and not have to worry about a too-tightly controlled single-process container environment.
4) It works in practice with USB devices. Some requirements are harder, if you need access to GPIO's or other ports you may need additional software to either move the device over entirely or mediate access to a shared resource (e.g. so that only a specific pin can be controlled). That can be done naturally by a privileged and managed service that runs on the host that something in the guest VM or container talks to.
5) Lastly VM vs container. Initially I've used system containers for this but, at least with LAVA and device access, this was cumbersome. While LXD does work admirably well, some of the finer points of being able to talk to udev (which is not present in a system container) are missing. Using a VM is a cheap way to avoid that. At the time of this writing, system containers have better USB hot-plug support but that's only useful if you can use them in the first place. If you have to unplug hardware from your system you may need to reboot the virtual machine for the software to notice that. At least until LXD is improved.
Let's look at how we're going to provide that next layer.
LXD
Deploying LXD on Ubuntu is a breeze. It's pre-installed. If you want to create a LXD cluster you can do that too, but in that case it's recommend to set up snap cohort so that your entire stack sees the same exact version of LXD as it refreshes. You can do that with snap create-cohort lxd
on any one system, and then snap refresh lxd --cohort=...
with the cohort key printed earlier. Setting up an LXD cluster is well documented and I won't cover it here.
To set up a VM with some sane defaults run this command:
lxc launch ubuntu:20.04 --vm -c limits.memory=4GB -c limits.cpu=2 -c security.secureboot=false
Let's break it down:
- First we pick
ubuntu:20.04
as our guest. You can pick anything you want to use but if you want to use it as a virtual machine, you should really stick to theimages:
remote where LXD maintainers publish tested images that bundle the LXD management agent. The agent is important for the system to act in a way that is possible to control from the outside. - The second argument,
--vm
, tells LXD to create a virtual machine instead of a container. - Next we set the amount of memory and virtual CPUs to present to the guest system. Lastly we ask LXD to disable secure boot.
- Ubuntu 20.04 aarch64 images apparently don't have the right signatures. I think this is fixed with later versions. If you try
22.04
you may give it a go without that argument. When experimenting pass--console
as well, to instantly attach a virtual console to the system and see what's going on. You can also use that to interact with virtual boot EFI firmware.
Wait for the system to spin up, lxc list
should show the IP address it was assigned. You can jump in with lxc shell
(you have to pass the randomly-generated system name as well) and look around. Once you have that working with whatever OS of choice you have, stop the system with lxc stop
and let's set up USB pass-through.
USB pass-through
The easiest way to do this interactively is to run lxc config edit
. This launches your $EDITOR
and lets you just edit whatever you want. In our case we want to edit the empty devices: {}
section so that it contains two virtual devices: one for USB nitrogen and one more for USB FTDI serial adapter. This is how this looks like:
devices:
ftdi-usb:
productid: "6001"
type: usb
vendorid: "0403"
nitrogen-usb:
productid: "0204"
type: usb
vendorid: 0d28
Note that I've removed the {}
value that devices:
was initially set to. You can see that we just tell LXD to pass through two USB devices and specify their product and vendor IDs.
Important: if you have multiple matching devices on your host they will all be forwarded. I've started working on a patch that lets you pick the exact device but it is not yet merged upstream.
Save the file and exit your editor. Re-start the machine with lxc start
. Wait for it to boot and run lxc shell
to get in. Run lsusb
and compare that with the output of the same command running on the host. Success? Almost.
By default, and for good reasons, LXD uses cloud images and those have a kernel tuned for virtual environments. Those don't ship with your USB-to-serial adapter drivers and a lot of other junk not needed when you want to spin those virtual machines up in seconds.
We have to change that. Fortunately, for ubuntu at least, it's super easy to just install linux-image-generic
and reboot. Having done that you can see that uname -a
will talk about the generic
kernel and that /dev/serial/by-id
is populated with nice symbolic links to your devices attached to the host.
Can we automate that? Sure! It's all cloud-init again. This time wrapped in an LXD profile. Let's see how this looks like:
LXD profiles
LXD has a system of profiles which let us extract a piece of configuration from a specific system and apply it to a class of systems. This works with storage, network, devices, limits and pretty much anything else LXD supports.
Let's create a profile for our class of systems. Let's call it lava-dispatcher
, since the profile will be applied to all the dispatchers in our fleet. If you've deployed LXD as a cluster, you can define the profile once and spin up many dispatchers, for example one per node. Let's create the profile with lxc profile create lava-dispatcher
and define it in a yaml file for our convenience.
While you can use lxc profile edit lava-dispatcher
to set things up interactively, the text you will be presented is generated by LXD from the internal database. We want to store our config in git
so let's define a file and then load it into the running LXD (cluster).
Here's the file I've prepared:
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Huawei Inc.
config:
boot.autostart: true
limits.cpu: 2
limits.memory: 4GB
security.secureboot: false
user.user-data: |
#cloud-config
packages:
# Install the generic kernel for access to various USB drivers.
- linux-image-generic
package_update: true
package_upgrade: true
package_reboot_if_required: true
devices:
ftdi-usb:
productid: "6001"
type: usb
vendorid: "0403"
nitrogen-usb:
productid: "0204"
type: usb
vendorid: 0d28
As you can see it has a few things. At the top we tell LXD to auto-start the instance with this profile. We want to be able to reboot the infrastructure host without having to remember to boot individual payloads. We also work around aarch64 secure-boot problem. We set the limits we've talked about earlier, we also have the devices down at the bottom. In the middle we have another user-data
file, embedded inside the LXD configuration (yaml-in-yaml). Note that the |
character makes the whole section one large string, without any structure, so you have to be careful as LXD cannot validate the user-data
file for you. Here we install the generic kernel and tell the system to update and reboot if necessary.
Let's try it out. Every time you make changes to lava-dispatcher.yaml
you can run lxc profile edit lava-dispatcher < lava-dispatcher.yaml
to load the profile into the cluster. You cannot do that when there are virtual machines running with that profile enabled, as LXD will refuse to apply changes to this type of systems on the fly.
Let's destroy our initial virtual machine with lxc delete --force instance-name
and try to set everything up in one step with lxc launch ubuntu:20.04 --vm --profile default --profile lava-dispatcher
. We pass --profile
twice since the default
profile defines networking and storage and our lava-dispatcher
profile defines just the details that make the instance into a valid host for a future LAVA dispatcher.
Wait for the instance to spin up and explore it interactively. It should be ready to go but setup will be considerably longer, since it will update lots of packages and reboot. Just give it time. You can observe the console with lxc console
if you want to. Remember that to quit the console you have to press ctrl+a q
.
From here on you can finish automating your software deployment, connect to ansible, set up users or do anything else that makes sense.
Docker CE
Since LAVA uses Docker we may want to use a more up-to-date version of the Docker package. Again, cloud-init
can help us do that.
We want to modify two parts of the profile: the apt:
part will list a new repository to load and the packages:
part will just tell cloud-init to install docker-ce
. This is how it looks like:
config:
# ...
user.user-data: |
#cloud-config
apt:
sources:
docker.list:
source: deb https://download.docker.com/linux/ubuntu focal stable
keyid: 9DC858229FC7DD38854AE2D88D81803C0EBFCD88
packages:
# ...
- docker-ce
Tailscale (inside)
If you want to let someone into the instance you can set up... another tailscale tunnel. Again, since the instance runs with working real-time clock, you don't need to jump through hoops this time.
config:
# ...
user.user-data: |
#cloud-config
apt:
sources:
tailscale.list:
source: deb https://pkgs.tailscale.com/stable/ubuntu focal main
keyid: 2596A99EAAB33821893C0A79458CA832957F5868
packages:
# ...
- tailscale
runcmd:
- tailscale up -authkey your-tailscale-key-here
Conclusion
There's a lot of messy wires and hand-holding but also a healthy amount of code and automation in this setup. Cloud-init is not everything but it does help to spin things up the way you want to considerably. From there on you only need to keep holding the steering wheel and drive your infrastructure where you want to.