Deploying Raspberry Pi as LAVA dispatcher for USB-attached DUTs

Today I will talk about using a Raspberry Pi 4B (8GB) with an SSD disk as infrastructure for deploying LAVA dispatcher used for testing USB-attached devices such as The 96Boards Nitrogen board.


This somewhat lengthy post goes into the practical details of setting up a minature infrastructure and test stack on a single board or a cluster of identical boards. It is separated into two main parts - the physical part with the bare-metal OS (infrastructure) and the software-defined services part built on top (test/payload). You can use it as a story, as a tutorial or as a quick google search result for a particular problem (perhaps).

I wanted to approach this problem as an infrastructure problem. With separate layer for managing the hardware and the base OS and another layer for whatever is needed by the testing stack. This split seems to be natural in environments with separate testing team and infrastructure team, where their goals differ.

Infrastructure layer

At the infrastructure layer we're using Raspberry Pi 4B with 8GB of RAM, up-to-date EEPROM bootloader configured to boot from USB, a USB-SATA adapter with a low-cost 128GB SATA SSD. Those are significantly faster and more robust than micro SD cards.

Ubuntu 20.04 LTS + cloud-init

For the management side operating system I've been using Ubuntu 20.04 LTS. This version is, at the time of this writing, the latest long-term-support release. In the future I plan to upgrade to 22.04 LTS as that may cut one step required from the setup process.

The setup process involves two stages. Preparing the hardware itself (assembling everything, updating the boot firmware using raspbian, wiring everything together) and software setup (putting some initial image on the SSD).

Start with ubuntu-20.04.3-preinstalled-server-arm64+raspi.img.xz. You can copy it to your SSD with dd, for example, assuming that your SSD is at /dev/sdb and the downloaded image is the current directory: xzcat ubuntu-20.04.3-preinstalled-server-arm64+raspi.img.xz | sudo dd of=/dev/sdb conv=sparse bs=4M. This should take just a moment, since we use conv=sparse to detect and skip writing all-zero blocks.

We don't want to use the default ubuntu user. Instead we want some admin accounts, ssh keys and the like. We can achieve that with cloud-init by preparing a user-data file. Here's an edited (abbreviated) file I've used:

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Huawei Inc.

- name: zyga
  gecos: Zygmunt Krynicki
  primary_group: zyga
  groups: users, adm, sudo, lxd
  shell: /bin/bash
  - "ssh-rsa (edited-out) zyga@hostname"

  list: [zyga:password]
  expire: true

hostname: pi4-1

  enabled: true

timezone: Europe/Warsaw

  - ssh

package_update: true
package_upgrade: true
package_reboot_if_required: true

This file is extremely basic and has some short-cuts applied. The real file I've used registered the system with our management and our VPN systems. This did bring additional complexity I will talk about later. For the moment you can see that it gives me a single user account, with authorized key and a fixed password that has to be changed on 1st boot. There's also a fixed hostname, ntp and time zone configuration, initial set of packages to install and a request to update everything on first boot. Ssh is important since we want to be able to log in and administer the device remotely.

Normally I would not set the password at all but during development it is very useful to be able to login interactively from the console or over the serial port. The cloud-init user-data file can be provided in one of several ways, but one that's low-cost and easy to do is to copy the file to the root directory of the boot partition (which is the first partition in the image). We'll do that shortly but first... some caveats.

In a perfect world that would be all that we need. I sincerely hope so for when Ubuntu 22.04 ships, that will be true. With no more u-boot and a more recent kernel, improved systemd and cloud-init. For the current Ubuntu 20.04 LTS, there are a few extra steps to cover.

Caveat: mass storage mode vs UFS mode

This doesn't apply to all the USB-SATA adapters but at least with Linux 5.4 and with the USB-SATA adapter using vendor:product 152d:0578 I had to add a quirk to force the device to operate in mass-storage mode. UFS mode was buggy and the device would never boot.

We can do that by editing the default kernel command line. The boot firmware loads it from cmdline.txt (by default). All I had to do was to append usb-storage.quirks=152d:0578:u to the parameter list there.

Caveat: USB boot support

Ubuntu 20.04 uses the following boot chain:

  • System firmware from EEPROM looks for boot media (configurable) and picks SD card (default).
  • The card is searched for a FAT partition with several files, notably start.elf, fixup.dat and config.txt (and several others, those don't matter to us).
  • The config.txt file instructs the firmware to load the kernel from a file specific to the version of the board used, here it would have been uboot_rpi_4.bin.
  • That in turn loads boot.scr and then picks up vmlinuz, de-compresses it in memory, loads and the system starts runing Linux.

There's only one problem. The version of u-boot used here doesn't support the USB chip used on the board, so we cannot boot from our SSD. Ooops!

Fortunately, the current boot firmware (start.elf and earlier a copy of bootcode.bin which is read from on-board EEPROM) is capable of doing that directly. Moreover, it also supports compressed kernels, another limitation that is now lifted.

In general the firmware loads config.txt. On Ubuntu 20.04 that file is set to load two include files (this is processed by start.elf) - those are syscfg.txt and usercfg.txt. I've used the first one to tell the boot firmware to ignore u-boot and load the kernel and initrd directly. Since start.elf has already been loaded from USB mass storage, we have no problems loading all the other files as well. We can use our USB-SATA adapter just fine.

Here's the syscfg.txt I've made. I left the comments so that you can see what various parts are for. You can remove everything but the last three lines, where we instruct the bootloader to load the kernel from vmlinuz. This overrides an earlier definition, in config.txt, which sets kernel= to one of the u-boot images.

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Huawei Inc.


# Enable UART for debugging.

# Use debug version of the boot firmware.

# Enable UART in the 2nd stage bootloader (start.elf).
# This is useful for diagnosing boot problems.

# Move bluetooth to slower UART to assist in early boot debugging.

# Ask the firmware to only poll the SD card once. This avoids a bug where udev
# is getting stuck when booting from USB and the SD card is not populated.

# Load the kernel directly, without looping via u-boot. This is what allows us
# to boot from USB. It is not needed for u-boot assisted SD card boots.
initramfs initrd.img followkernel

In my setup I've used a Makefile to program and alter the image. As you can see I'm copying user-data, cmdline.txt and syscfg.txt all in one step. The makefile rule below is edited for brevity.

flash-%: MEDIA_DEVICE ?= $(error set MEDIA_DEVICE= to the path of the block device)
flash-%: BOOT_PART ?= 1
flash-%: IMAGE_XZ ?= ubuntu-20.04.3-preinstalled-server-arm64+raspi.img.xz
flash-%: D := $(shell mktemp -d)flash-%: %/user-data %/cmdline.txt %/syscfg.txt
        test `id -u` -eq 0
        xzcat $(IMAGE_XZ) | dd of=$(MEDIA_DEVICE) conv=sparse bs=4M
        unshare -m /bin/sh -c "mount $(MEDIA_DEVICE)$(BOOT_PART) $(D) && cp -v $^ $(D)"

Caveat: initial system time

Raspberry Pi 4 is popular and fun, but like all the boards in its lineage, does not come with a battery-powered real time clock. When you boot the system, it's 1970 all over again. Parts of the OS have been tuned to do better than that, by loading timestamps from various places, like a built-in timestamp in systemd (system time must not be earlier than systemd build time) or support packages that store and restore the time at shutdown and startup. Nothing here can make a device left in cold storage for a while and booted while being disconnected from the network will know what time it is. It just doesn't have the hardware to do that.

This makes first boot annoyingly complicated. We want to go the the network and fetch packages. That implies verifying certificates and their associated validity window. I didn't debug this deeply but at least in Ubuntu 20.04 there's no way to synchronize attempt to install the first package (or add a 3rd party repository) with system having obtained initial synchronization from network time servers.

What I've found is that, at least the version of systemd used in Ubuntu 20.04 has one way of setting initial time. Systemd will stat /var/lib/systemd/timesync/clock and use the mtime as the earliest valid system time. This is paramount for the case of creating an initial disk image now and quickly booting the system with it, as we can make sure that the system will have so-so time early enough when you start to initiate https connections and need to validate certificates.

This is a kludge. Ideally cloud-init should have a way to force the rest of the commands to wait for NTP time sync, if enabled.

In my helper Makefile I solved it like this:

flash-%: ROOT_PART ?= 2
flash-%: D := $(shell mktemp -d)
    unshare -m /bin/sh -c "mount $(MEDIA_DEVICE)$(ROOT_PART) $(D) && mkdir -v -p $(D)/var/lib/systemd/timesync && touch $(D)/var/lib/systemd/timesync/clock"

The variables don't matter, except for D which, as you can see, is a temporary directory. All that matters is that we touch the afforementioned file on the second partition of the image.

First boot

With all of this done and a few goats sacrificed, we can now plug in network and power and boot the system for the first time.

The first boot is critical. Network must be working, stars must align. If things fail here, the device will just sit idle, forever. Nothing will be re-tried. Normally cloud-init is used in the cloud, where network typically works. If something fails in your local environment you have two options:

  • Start over and re-write the disk.
  • Log in interactively, if you can, to run cloud-init clean and reboot.

For debugging you may want to run cloud-init collect-logs and inspect the resulting tarball on your preferred system. It contains crucial information from first boot. Two things to look out for: DNS issues and the exact moment system time jumps from the fake time baked into the image to the actual local time.

Once the system boots this part is done. Good work. You have a solid enough infrastructure to provision the next level of the stack and let your users in.

Optional extras


If you are a landscape user, you may want to extend your user-data file with a section that looks like this one:

    url: ""
    # NOTE: this cannot use https, don't change it!
    ping_url: ""
    data_path: "/var/lib/landscape/client"
    tags: ""
    computer_title: "{{ v1.local-hostname }}"
    account_name: "your landscape account name"

Note the {{ ... }} fragment. This is a jinja template. If you use that you have to put one at the top of the user-data file:

## template: jinja

Use those exact number of hashes and spaces, or the template magic won't kick-in.

This will set everything up to register your system with Landscape. Unless you use an on-prem installation the URLs are valid and ready to go. You may want to set an account key as well, if you use one.

You should check if landscape-client.service is enabled. It will work on first boot but after the first reboot, I've seen cases where the service was just inactive. You may want to add systemctl enable landscape-client.service to your runcmd: section.


If you want to use tailscale to VPN into your infrascturcture you will want to add the repository and register to it on the first boot. There are some caveats here as well, sadly. First of all, tailscale SSL certifcate is from Let's Encrypt and, at least for the Ubuntu 20.04.3 image I've been using, is not valid until you update your ssl certificates. This means that you cannot just apt-get update as apt will not accept the certificate from tailscale until you update ca-certificates.

You can use this snippet as a starting point and perhaps manage it via landscape with package profiles and user scripts. Alternatively you can do it in runcmds directly, if you don't mind runnnig curl | sudo sh style programs.

      source: deb focal main
      key: |
        -----BEGIN PGP PUBLIC KEY BLOCK-----

        -----END PGP PUBLIC KEY BLOCK-----

 - tailscale up -authkey your-tailscale-key-here


We have a nice system running but it should be manged as just another server in your fleet. What I've described here is the bare minimum I happen to do. Your site may use other software and stacks to ease remote administration at scale.

Using a Raspberry Pi gives us the advantage of having a low-cost system with various interesting peripherals that often come in handy when doing device testing.

In the next chapter we will look at the next part of the stack, our LAVA dispatcher.

The test/content layer

We now have a nice and mostly vanilla low-cost system. Let's use it for deploying LAVA. After a few iterations with this idea I'm deploying LAVA inside a virtual machine, with USB pass-through offering access to USB-serial adapters and USB-attached 96Boards Nitrogen micro-controller board.

Why like that? Let's break it down:

1) We can destroy and re-provision anything on top of this system without touching the hardware. This is just nice for management, as the hardware may be locked in a lab room somewhere, with controlled access and you may be far away, for instance working from home. This part is non-controversial.

2) We can use the system for other things. In particular it's probably the low-cost aarch64 system you always wanted to try. You can enable it in CI (we'll talk about resource management later) and let your developers compile and test on aarch64 naively.

3) LAVA is tested this way. It likes to have access to a full-system like environment. With a kernel image around and things to use for libguestfs. With udev rules and all the other dirty parts of the plumbing layer that may be needed. This lets your test team set anything up they want and not have to worry about a too-tightly controlled single-process container environment.

4) It works in practice with USB devices. Some requirements are harder, if you need access to GPIO's or other ports you may need additional software to either move the device over entirely or mediate access to a shared resource (e.g. so that only a specific pin can be controlled). That can be done naturally by a privileged and managed service that runs on the host that something in the guest VM or container talks to.

5) Lastly VM vs container. Initially I've used system containers for this but, at least with LAVA and device access, this was cumbersome. While LXD does work admirably well, some of the finer points of being able to talk to udev (which is not present in a system container) are missing. Using a VM is a cheap way to avoid that. At the time of this writing, system containers have better USB hot-plug support but that's only useful if you can use them in the first place. If you have to unplug hardware from your system you may need to reboot the virtual machine for the software to notice that. At least until LXD is improved.

Let's look at how we're going to provide that next layer.


Deploying LXD on Ubuntu is a breeze. It's pre-installed. If you want to create a LXD cluster you can do that too, but in that case it's recommend to set up snap cohort so that your entire stack sees the same exact version of LXD as it refreshes. You can do that with snap create-cohort lxd on any one system, and then snap refresh lxd --cohort=... with the cohort key printed earlier. Setting up an LXD cluster is well documented and I won't cover it here.

To set up a VM with some sane defaults run this command:

lxc launch ubuntu:20.04 --vm -c limits.memory=4GB -c limits.cpu=2 -c security.secureboot=false

Let's break it down:

  • First we pick ubuntu:20.04 as our guest. You can pick anything you want to use but if you want to use it as a virtual machine, you should really stick to the images: remote where LXD maintainers publish tested images that bundle the LXD management agent. The agent is important for the system to act in a way that is possible to control from the outside.
  • The second argument, --vm, tells LXD to create a virtual machine instead of a container.
  • Next we set the amount of memory and virtual CPUs to present to the guest system. Lastly we ask LXD to disable secure boot.
  • Ubuntu 20.04 aarch64 images apparently don't have the right signatures. I think this is fixed with later versions. If you try 22.04 you may give it a go without that argument. When experimenting pass --console as well, to instantly attach a virtual console to the system and see what's going on. You can also use that to interact with virtual boot EFI firmware.

Wait for the system to spin up, lxc list should show the IP address it was assigned. You can jump in with lxc shell (you have to pass the randomly-generated system name as well) and look around. Once you have that working with whatever OS of choice you have, stop the system with lxc stop and let's set up USB pass-through.

USB pass-through

The easiest way to do this interactively is to run lxc config edit. This launches your $EDITOR and lets you just edit whatever you want. In our case we want to edit the empty devices: {} section so that it contains two virtual devices: one for USB nitrogen and one more for USB FTDI serial adapter. This is how this looks like:

    productid: "6001"
    type: usb
    vendorid: "0403"
    productid: "0204"
    type: usb
    vendorid: 0d28

Note that I've removed the {} value that devices: was initially set to. You can see that we just tell LXD to pass through two USB devices and specify their product and vendor IDs.

Important: if you have multiple matching devices on your host they will all be forwarded. I've started working on a patch that lets you pick the exact device but it is not yet merged upstream.

Save the file and exit your editor. Re-start the machine with lxc start. Wait for it to boot and run lxc shell to get in. Run lsusb and compare that with the output of the same command running on the host. Success? Almost.

By default, and for good reasons, LXD uses cloud images and those have a kernel tuned for virtual environments. Those don't ship with your USB-to-serial adapter drivers and a lot of other junk not needed when you want to spin those virtual machines up in seconds.

We have to change that. Fortunately, for ubuntu at least, it's super easy to just install linux-image-generic and reboot. Having done that you can see that uname -a will talk about the generic kernel and that /dev/serial/by-id is populated with nice symbolic links to your devices attached to the host.

Can we automate that? Sure! It's all cloud-init again. This time wrapped in an LXD profile. Let's see how this looks like:

LXD profiles

LXD has a system of profiles which let us extract a piece of configuration from a specific system and apply it to a class of systems. This works with storage, network, devices, limits and pretty much anything else LXD supports.

Let's create a profile for our class of systems. Let's call it lava-dispatcher, since the profile will be applied to all the dispatchers in our fleet. If you've deployed LXD as a cluster, you can define the profile once and spin up many dispatchers, for example one per node. Let's create the profile with lxc profile create lava-dispatcher and define it in a yaml file for our convenience.

While you can use lxc profile edit lava-dispatcher to set things up interactively, the text you will be presented is generated by LXD from the internal database. We want to store our config in git so let's define a file and then load it into the running LXD (cluster).

Here's the file I've prepared:

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Huawei Inc.
  boot.autostart: true
  limits.cpu: 2
  limits.memory: 4GB
    security.secureboot: false
  user.user-data: |
      # Install the generic kernel for access to various USB drivers.
      - linux-image-generic
    package_update: true
    package_upgrade: true
    package_reboot_if_required: true
    productid: "6001"
    type: usb
    vendorid: "0403"
    productid: "0204"
    type: usb
    vendorid: 0d28

As you can see it has a few things. At the top we tell LXD to auto-start the instance with this profile. We want to be able to reboot the infrastructure host without having to remember to boot individual payloads. We also work around aarch64 secure-boot problem. We set the limits we've talked about earlier, we also have the devices down at the bottom. In the middle we have another user-data file, embedded inside the LXD configuration (yaml-in-yaml). Note that the | character makes the whole section one large string, without any structure, so you have to be careful as LXD cannot validate the user-data file for you. Here we install the generic kernel and tell the system to update and reboot if necessary.

Let's try it out. Every time you make changes to lava-dispatcher.yaml you can run lxc profile edit lava-dispatcher < lava-dispatcher.yaml to load the profile into the cluster. You cannot do that when there are virtual machines running with that profile enabled, as LXD will refuse to apply changes to this type of systems on the fly.

Let's destroy our initial virtual machine with lxc delete --force instance-name and try to set everything up in one step with lxc launch ubuntu:20.04 --vm --profile default --profile lava-dispatcher. We pass --profile twice since the default profile defines networking and storage and our lava-dispatcher profile defines just the details that make the instance into a valid host for a future LAVA dispatcher.

Wait for the instance to spin up and explore it interactively. It should be ready to go but setup will be considerably longer, since it will update lots of packages and reboot. Just give it time. You can observe the console with lxc console if you want to. Remember that to quit the console you have to press ctrl+a q.

From here on you can finish automating your software deployment, connect to ansible, set up users or do anything else that makes sense.

Docker CE

Since LAVA uses Docker we may want to use a more up-to-date version of the Docker package. Again, cloud-init can help us do that.

We want to modify two parts of the profile: the apt: part will list a new repository to load and the packages: part will just tell cloud-init to install docker-ce. This is how it looks like:

    # ...
    user.user-data: |
                    source: deb focal stable
                    keyid: 9DC858229FC7DD38854AE2D88D81803C0EBFCD88
                # ...
                - docker-ce

Tailscale (inside)

If you want to let someone into the instance you can set up... another tailscale tunnel. Again, since the instance runs with working real-time clock, you don't need to jump through hoops this time.

    # ...
    user.user-data: |
                    source: deb focal main
                    keyid: 2596A99EAAB33821893C0A79458CA832957F5868
                # ...
                - tailscale
            - tailscale up -authkey your-tailscale-key-here


There's a lot of messy wires and hand-holding but also a healthy amount of code and automation in this setup. Cloud-init is not everything but it does help to spin things up the way you want to considerably. From there on you only need to keep holding the steering wheel and drive your infrastructure where you want to.

You'll only receive email when they publish something new.

More from Zygmunt Krynicki
All posts