Tag: linux

Update from the PipeWire hackfest

As the third and final day of the PipeWire hackfest draws to a close, I thought I’d summarise some of my thoughts on the goings-on and the future.

Thanks

Before I get into the details, I want to send out a big thank you to:

  • Christian Schaller for all the hard work of organising the event and Wim Taymans for the work on PipeWire so far (and in the future)
  • The GNOME Foundation, for sponsoring the event as a whole
  • Qualcomm, who are funding my presence at the event
  • Collabora, for sponsoring dinner on Monday
  • Everybody who attended and participate for their time and thoughtful comments

Background

For those of you who are not familiar with it, PipeWire (previously Pinos, previously PulseVideo) was Wim’s effort at providing secure, multi-program access to video devices (like webcams, or the desktop for screen capture). As he went down that rabbit hole, he wrote SPA, a lightweight general-purpose framework for representing a streaming graph, and this led to the idea of expanding the project to include support for low latency audio.

The Linux userspace audio story has, for the longest time, consisted of two top-level components: PulseAudio which handles consumer audio (power efficiency, wide range of arbitrary hardware), and JACK which deals with pro audio (low latency, high performance). Consolidating this into a good out-of-the-box experience for all use-cases has been a long-standing goal for myself and others in the community that I have spoken to.

An Opportunity

From a PulseAudio perspective, it has been hard to achieve the 1-to-few millisecond latency numbers that would be absolutely necessary for professional audio use-cases. A lot of work has gone into improving this situation, most recently with David Henningsson’s shared-ringbuffer channels that made client/server communication more efficient.

At the same time, as application sandboxing frameworks such as Flatpak have added security requirements of us that were not accounted for when PulseAudio was written. Examples including choosing which devices an application has access to (or can even know of) or which applications can act as control entities (set routing etc., enable/disable devices). Some work has gone into this — Ahmed Darwish did some key work to get memfd support in PulseAudio, and Wim has prototyped an access-control mechanism module to enable a Flatpak portal for sound.

All this said, there are still fundamental limitations in architectural decisions in PulseAudio that would require significant plumbing to address. With Wim’s work on PipeWire and his extensive background with GStreamer and PulseAudio itself, I think we have an opportunity to revisit some of those decisions with the benefit of a decade’s worth of learning deploying PulseAudio in various domains starting from desktops/laptops to phones, cars, robots, home audio, telephony systems and a lot more.

Key Ideas

There are some core ideas of PipeWire that I am quite excited about.

The first of these is the graph. Like JACK, the entities that participate in the data flow are represented by PipeWire as nodes in a graph, and routing between nodes is very flexible — you can route applications to playback devices and capture devices to applications, but you can also route applications to other applications, and this is notionally the same thing.

The second idea is a bit more radical — PipeWire itself only “runs” the graph. The actual connections between nodes are created and managed by a “session manager”. This allows us to completely separate the data flow from policy, which means we could write completely separate policy for desktop use cases vs. specific embedded use cases. I’m particularly excited to see this be scriptable in a higher-level language, which is something Bastien has already started work on!

A powerful idea in PulseAudio was rewinding — the ability to send out huge buffers to the device, but the flexibility to rewind that data when things changed (a new stream got added, or the stream moved, or the volume changed). While this is great for power saving, it is a significant amount of complexity in the code. In addition, with some filters in the data path, rewinding can break the algorithm by introducing non-linearity. PipeWire doesn’t support rewinds, and we will need to find a good way to manage latencies to account for low power use cases. One example is that we could have the session manager bump up the device latency when we know latency doesn’t matter (Android does this when the screen is off).

There are a bunch of other things that are in the process of being fleshed out, like being able to represent the hardware as a graph as well, to have a clearer idea of what is going on within a node. More updates as these things are more concrete.

The Way Forward

There is a good summary by Christian about our discussion about what is missing and how we can go about trying to make a smooth transition for PulseAudio users. There is, of course, a lot to do, and my ideal outcome is that we one day flip a switch and nobody knows that we have done so.

In practice, we’ll need to figure out how to make this transition seamless for most people, while folks with custom setup will need to be given a long runway and clear documentation to know what to do. It’s way to early to talk about this in more specifics, however.

Configuration

One key thing that PulseAudio does right (I know there are people who disagree!) is having a custom configuration that automagically works on a lot of Intel HDA-based systems. We’ve been wondering how to deal with this in PipeWire, and the path we think makes sense is to transition to ALSA UCM configuration. This is not as flexible as we need it to be, but I’d like to extend it for that purpose if possible. This would ideally also help consolidate the various methods of configuration being used by the various Linux userspaces.

To that end, I’ve started trying to get a UCM setup on my desktop that PulseAudio can use, and be functionally equivalent to what we do with our existing configuration. There are missing bits and bobs, and I’m currently focusing on the ones related to hardware volume control. I’ll write about this in the future as the effort expands out to other hardware.

Onwards and upwards

The transition to PipeWire is unlikely to be quick or completely-painless or free of contention. For those who are worried about the future, know that any switch is still a long way away. In the mean time, however, constructive feedback and comments are welcome.

PulseAudio vs. AudioFlinger: Fight!

I’ve been meaning to try this for a while, and we’ve heard a number of requests from the community as well. Recently, I got some time here at Collabora to give it a go — that is, to get PulseAudio running on an Android device and see how it compares with Android’s AudioFlinger.

The Contenders

Let’s introduce our contenders first. For those who don’t know, PulseAudio is pretty much a de-facto standard part of the Linux audio stack. It sits on top of ALSA which provides a unified way to talk to the audio hardware and provides a number of handy features that are useful on desktops and embedded devices. I won’t rehash all of these, but this includes a nice modular framework, a bunch of power saving features, flexible routing, and lots more. PulseAudio runs as a daemon, and clients usually use the libpulse library to communicate with it.

In the other corner, we have Android’s native audio system — AudioFlinger. AudioFlinger was written from scratch for Android. It provides an API for playback/recording as well as a control mechanism for implementing policy. It does not depend on ALSA, but instead allows for a sort of HAL that vendors can implement any way they choose. Applications generally play audio via layers built on top of AudioFlinger. Even if you write a native application, it would use OpenSL ES implementation which goes through AudioFlinger. The actual service runs as a thread of the mediaserver daemon, but this is merely an implementation detail.

Note: all my comments about AudioFlinger and Android in general are based on documentation and code for Android 4.0 (Ice Cream Sandwich).

The Arena

My test-bed for the tests was the Galaxy Nexus running Android 4.0 which we shall just abbreviate to ICS. I picked ICS since it is the current platform on which Google is building, and hopefully represents the latest and greatest in AudioFlinger development. The Galaxy Nexus runs a Texas Instruments OMAP4 processor, which is also really convenient since this chip has pretty good support for running stock Linux (read on to see how useful this was).

Preparations

The first step in getting PulseAudio on Android was deciding between using the Android NDK like a regular application or integrate into the base Android system. I chose the latter — even though this was a little more work initially, it made more sense in the long run since PulseAudio really belongs to the base-system.

The next task was to get the required dependencies ported to Android. Fortunately, a lot of the ground work for this was already done by some of the awesome folks at Collabora. Derek Foreman’s androgenizer tool is incredibly handy for converting an autotools-based build to Android–friendly makefiles. With Reynaldo Verdejo and Alessandro Decina’s prior work on GStreamer for Android as a reference, things got even easier.

The most painful bit was libltdl, which we use for dynamically loading modules. Once this was done, the other dependencies were quite straightforward to port over. As a bonus, the Android source already ships an optimised version of Speex which we use for resampling, and it was easy to reuse this as well.

As I mentioned earlier, vendors can choose how they implement their audio abstraction layer. On the Galaxy Nexus, this is built on top of standard ALSA drivers, and the HAL talks to the drivers via a minimalist tinyalsa library. My first hope was to use this, but there was a whole bunch of functions missing that PulseAudio needed. The next approach was to use salsa-lib, which is a stripped down version of the ALSA library written for embedded devices. This too had some missing functions, but these were fewer and easy to implement (and are now upstream).

Now if only life were that simple. :) I got PulseAudio running on the Galaxy Nexus with salsa-lib, and even got sound out of the HDMI port. Nothing from the speakers though (they’re driven by a TI twl6040 codec). Just to verify, I decided to port the full alsa-lib and alsa-utils packages to debug what’s happening (by this time, I’m familiar enough with androgenizer for all this to be a breeze). Still no luck. Finally, with some pointers from the kind folks at TI (thanks Liam!), I got current UCM configuration files for OMAP4 boards, and some work-in-progress patches to add UCM support to PulseAudio, and after a couple of minor fixes, wham! We have output. :)

(For those who don’t know about UCM — embedded chips are quite different from desktops and expose a huge amount of functionality via ALSA mixer controls. UCM is an effort to have a standard, meaningful way for applications and users to use these.)

In production, it might be handy to write light-weight UCM support for salsa-lib or just convert the UCM configuration into PulseAudio path/profile configuration (bonus points if it’s an automated tool). For our purposes, though, just using alsa-lib is good enough.

To make the comparison fair, I wrote a simple test program that reads raw PCM S16LE data from a file and plays it via the AudioTrack interface provided by AudioFlinger or the PulseAudio Asynchronous API. Tests were run with the brightness fixed, wifi off, and USB port connected to my laptop (for adb shell access).

All tests were run with the CPU frequency pegged at 350 MHz and with 44.1 and 48 kHz samples. Five readings were recorded, and the median value was finally taken.

Round 1: CPU

First, let’s take a look at how the two compare in terms of CPU usage. The numbers below are the percentage CPU usage taken as the sum of all threads of the audio server process and the audio thread in the client application using top (which is why the granularity is limited to an integer percentage).

44.1 kHz 48 kHz
AF PA AF PA
1% 1% 2% 0%

At 44.1 kHz, the two are essentially the same. Both cases are causing resampling to occur (the native sample rate for the device is 48 kHz). Resampling is done using the Speex library, and we’re seeing minuscule amounts of CPU usage even at 350 MHz, so it’s clear that the NEON optimisations are really paying off here.

The astute reader would have noticed that since the device’ native sample rate is 48 kHz, the CPU usage for 48 kHz playback should be less than for 44.1 kHz. This is true with PulseAudio, but not with AudioFlinger! The reason for this little quirk is that AudioFlinger provides 44.1 kHz samples to the HAL (which means the stream is resampled there), and then the HAL needs to resample it again to 48 kHz to bring it to the device’ native rate. From what I can tell, this is a matter of convention with regards to what audio HALs should expect from AudioFlinger (do correct me if I’m mistaken about the rationale).

So round 1 leans slightly in favour of PulseAudio.

Round 2: Memory

Comparing the memory consumption of the server process is a bit meaningless, because the AudioFlinger daemon thread shares an address space with the rest of the mediaserver process. For the curious, the resident set size was: AudioFlinger — 6,796 KB, PulseAudio — 3,024 KB. Again, this doesn’t really mean much.

We can, however, compare the client process’ memory consumption. This is RSS in kilobytes, measured using top.

44.1 kHz 48 kHz
AF PA AF PA
2600 kB 3020 kB 2604 kB 3020 kB

The memory consumption is comparable between the two, but leans in favour of AudioFlinger.

Round 3: Power

I didn’t have access to a power monitor, so I decided to use a couple of indirect metrics to compare power utilisation. The first of these is PowerTOP, which is actually a Linux desktop tool for monitoring various power metrics. Happily, someone had already ported PowerTOP to Android. The tool reports, among other things, the number of wakeups-from-idle per second for the processor as a whole, and on a per-process basis. Since there are multiple threads involved, and PowerTOP’s per-process measurements are somewhat cryptic to add up, I used the global wakeups-from-idle per second. The “Idle” value counts the number of wakeups when nothing is happening. The actual value is very likely so high because the device is connected to my laptop in USB debugging mode (lots of wakeups from USB, and the device is prevented from going into a full sleep).

44.1 kHz 48 kHz
Idle AF PA AF PA
79.6 107.8 87.3 108.5 85.7

The second, similar, data point is the number of interrupts per second reported by vmstat. These corroborate the numbers above:

44.1 kHz 48 kHz
Idle AF PA AF PA
190 266 215 284 207

PulseAudio’s power-saving features are clearly highlighted in this comparison. AudioFlinger causes about three times the number of wakeups per second that PulseAudio does. Things might actually be worse on older hardware with less optimised drivers than the Galaxy Nexus (I’d appreciate reports from running similar tests on a Nexus S or any other device with ALSA support to confirm this).

For those of you who aren’t familiar with PulseAudio, the reason we manage to get these savings is our timer-based scheduling mode. In this mode, we fill up the hardware buffer as much as possible and go to sleep (disabling ALSA interrupts while we’re at it, if possibe). We only wake up when the buffer is nearing empty, and fill it up again. More details can be found in this old blog post by Lennart.

Round 4: Latency

I’ve only had the Galaxy Nexus to actually try this out with, but I’m pretty certain I’m not the only person seeing latency issues on Android. On the Galaxy Nexus, for example, the best latency I can get appears to be 176 ms. This is pretty high for certain types of applications, particularly ones that generate tones based on user input.

With PulseAudio, where we dynamically adjust buffering based on what clients request, I was able to drive down the total buffering to approximately 20 ms (too much lower, and we started getting dropouts). There is likely room for improvement here, and it is something on my todo list, but even out-of-the-box, we’re doing quite well.

Round 5: Features

With the hard numbers out of the way, I’d like to talk a little bit about what else PulseAudio brings to the table. In addition to a playback/record API, AudioFlinger provides mechanism for enforcing various bits of policy such as volumes and setting the “active” device amongst others. PulseAudio exposes similar functionality, some as part of the client API and the rest via the core API exposed to modules.

From SoC vendors’ perspective, it is often necessary to support both Android and standard Linux on the same chip. Being able to focus only on good quality ALSA drivers and knowing that this will ensure quality on both these systems would be a definite advantage in this case.

The current Android system leaves power management to the audio HAL. This means that each vendor needs to implement this themselves. Letting PulseAudio manage the hardware based on requested latencies and policy gives us a single point of control, greatly simplifying the task of power-management and avoiding code duplication.

There are a number of features that PulseAudio provides that can be useful in the various scenarios where Android is used. For example, we support transparently streaming audio over the network, which could be a handy way of supporting playing audio from your phone on your TV completely transparently and out-of-the-box. We also support compressed formats (AC3, DTS, etc.) which the ongoing Android-on-your-TV efforts could likely take advantage of.

Edit: As someone pointed out on LWN, I missed one thing — AudioFlinger has an effect API that we do not yet have in PulseAudio. It’s something I’d definitely like to see added to PulseAudio in the future.

Ding! Ding! Ding!

That pretty much concludes the comparison of these two audio daemons. Since the Android-side code is somewhat under-documented, I’d welcome comments from readers who are familiar with the code and history of AudioFlinger.

I’m in the process of pushing all the patches I’ve had to write to the various upstream projects. A number of these are merely build system patches to integrate with the Android build system, and I’m hoping projects are open to these. Instructions on building this code will be available on the PulseAudio Android wiki page.

For future work, it would be interesting to write a wrapper on top of PulseAudio that exposes the AudioFlinger audio and policy APIs — this would basically let us run PulseAudio as a drop-in AudioFlinger replacement. In addition, there are potential performance benefits that can be derived from using Android-specific infrastructure such as Binder (for IPC) and ashmem (for transferring audio blocks as shared memory segments, something we support on desktops using the standard Linux SHM mechanism which is not available on Android).

If you’re an OEM who is interested in this work, you can get in touch with us — details are on the Collabora website.

I hope this is useful to some of you out there!

Well done, Adobe!

In an unsurprising turn of events, Adobe completely fails to play well with modern Linux systems. Well done, guys. Well done, indeed.

p.s.: I was quite happy to see that the Google Talk plugin has proper PulseAudio support (thanks to the WebRTC née GIPS code, it looks like).

LPC ho!

I’m going to be at the Linux Plumbers’ Conference next week, speaking about the things we’ve been doing to make passthrough audio on Linux kick ass.

If you’re around and interested, do drop by!

More PulseAudio power goodness

[tl;dr — if you’re using GNOME or a GStreamer-based player, not using the Rhythmbox crossfading backend, and want to try to save ~0.5 W of power, jump to end of the post]

Lennart pointed to another blog post about actually putting PulseAudio’s power-saving capabilities to use on your system. The latter provides a hack-ish way to increase buffering in PulseAudio to the maximum possible, reducing the number of wakeups. I’m going to talk about that a bit.

Summarising the basic idea, we want music players to decode a large chunk of data and give it to PA so that we can then fill up ALSA’s hardware buffer, sleep till it’s almost completely consumed, fill it again, sleep, repeat. More details in this post from Lennart.

The native GNOME audio/video players don’t talk to PulseAudio directly — they use GStreamer, which has a pulsesink element that actually talks to PulseAudio. We could configure things so that we send a large amount (say 2 seconds’ worth) to PulseAudio, sleep, and then wake up periodically to push out more. Now in the audio player (say Rhythmbox), the user hits next, prev, or pause. We need to effect this change immediately, even though we’ve already sent out 2 seconds of data (it would suck if you hit pause and the actual pause happened 2 seconds later, wouldn’t it?). PulseAudio already solves because it can internally “rewind” the buffer and overwrite it if required. GStreamer can and does take advantage of this by sending pause and other control messages out of band from the data.

This all works well for relatively simple GStreamer pipelines. However, if you want to do something more complicated, like Rhythmbox’ crossfading backend, things start to break. PulseAudio doesn’t offer an API to do fades, and since we don’t do rewinds in GStreamer, we need to apply effects such as fades with a latency equal to the amount of buffering we’re asking PulseAudio to do. This makes for unhappy users.

Well, all is not as bleak as it seems. There was some discussion on the PA mailing list, and the need for a proper fade API (really, a generic effects API) is clear. There have even been attempts to solve this in GStreamer.

But you want to save 0.5 W of power now! Okay, if you’re not using the Rhythmbox crossfading backend (or are okay with disabling it), this will make Rhythmbox, Banshee, pre-3.0 Totem (and really any GNOMEy player that uses gconfaudiosink, which will soon be replaced by gsettingsaudiosink, I guess), you can run this on the command line:

gconftool-2 --type string \
    --set /system/gstreamer/0.10/default/musicaudiosink \
    "pulsesink latency-time=100000 buffer-time=2000000"

On my machine, this brings down the number of wakeups per second because of alsa-sink to ~2.7 (corresponding nicely to the ~350ms of hardware buffer that I have). With Totem 3.0, this may or may not work, depending on whether your distribution gives gconfaudiosink a higher rank than pulseaudiosink.

This is clearly just a stop-gap till we can get things done the Right Way™ at the system level, so really, if things break, you get to keep the pieces. If you need to, you can undo this change by running the same command without the latency-time=… and buffer-time=… bits. That said, if something does break, do leave a comment below so I can add it to the list of things that we need to test the final solution with.

GNOME3 Power Settings

Richard Hughes recently posted about the recent GNOME3 Power Settings design that got a lot of people (myself included) hot and bothered. As I said in my comment, I think that a lot of people prefer that their laptop stay on when the lid is closed. There are clearly other who, like myself, would prefer to maintain the normal behaviour when an external monitor is plugged in.

So Nirbheek Chauhan and I designed a couple of quick mockups that I think would work well. This doesn’t address customising behaviour with an external monitor, but I don’t feel nearly as strongly about that being hidden in dconf-editor as I do about the rest.

My mockup

Nirbheek's mockup

While Nirbheek’s version looks decidedly prettier, I think the meaning of the icons is not absolutely obvious. This might be solvable by some explanatory text above and mouse-overs.

While doing all this, though, it’s clear that it is really hard to design a UI that you think will please enough people, and really easy to make assumptions about what “people” want and how they use their computers. So kudos to the GNOME3 UI designers for taking up this difficult job and I hope they take all the feedback flying around in a positive spirit (even if the messages are often not quite positive-sounding ;) )

Pure EFI Linux Boot on Macbooks

My company was really kind to get me a Macbook Pro (the 13.3-inch “5.5” variant). It is an awesome piece of hardware! (especially after my own PoS HP laptop I’ve been cussing at for a while now)

That said, I still don’t like the idea of running a proprietary operating system on it (as beautiful as OS X is ;)), so I continue to happily use Gentoo. The standard amd64 install works just fine with some minor hiccups (keyboard doesn’t work on the LiveCD, kernel only shows a console with vesafb).

The one thing that did bother me is BIOS-emulation. For those coming from the PC world, Macs don’t have a BIOS. They run something called EFI which is significantly more advanced (though I think the jury’s out on quirkiness issues and Linus certainly doesn’t approve of the added complexity).

Anyway, in order to support booting other OSes (=> Windows) exactly as they would on PCs, Apple has added a BIOS emulation layer. This is how Ubuntu (at least as of 9.10) boots on Macbooks. Given that both the bootloader (be it Grub2 or elilo) and the Linux kernel support booting in an EFI environment, it rubbed me the wrong way to take the easy way out and just boot them in BIOS mode. There is a reasonable technical argument for this – I see no good reason to add one more layer of software (read bugs) when there is no need at all. After a lot of pain, I did manage do make Linux boot in EFI-only mode. There is not enough (accurate, easily-findable) documentation out there, so this is hard-won knowledge. :) I’m putting this up to help others avoid this pain.

Here’s what I did (I might be missing some stuff since this was done almost a month ago). The basic boot steps look something like this:

  1. EFI firmware starts on boot
  2. Starts rEFIt, a program that extends the default bootloader to provide a nice bootloader menu, shell, etc.
  3. Scans FAT/HFS partitions (no ext* support, despite some claims on the Internet) for bootable partitions (i.e. having a /efi/… directory with valid boot images)
  4. Runs the Grub2 EFI image from a FAT partition
  5. Loads the Linux kernel (and initrd/initramfs if any) from /boot
  6. Kernel boots normally with whatever your root partition is

Now you could use elilo instead of Grub2, but I found this it to not work well (or at all) for me, so I just used a Grub2 (1.97.1, with some minor modifications) (just adds an “efi” USE-flag to build with --with-platform=efi). While I could make /boot a FAT partition, this would break the installkernel script (it’s run by make install in your kernel source directory), which makes symlinks for your latest/previous kernel image.

Instructions for installing the Grub2 EFI image are here. Just ignore the “bless” instructions (that’s for OS X), and put the EFI image and other stuff in something like /efi/grub (the /efi is mandatory). You can create a basic config file using grub-mkconfig and then tweak it to taste. The Correct Way™ to do this, though, is to edit the files in /etc/grub.d/.

Of course, you need to enable EFI support in the kernel, but that’s it. With this, you’re all set for the (slightly obsessive-compulsive) satisfaction of not having to enable yet another layer to support yet another proprietary interface, neither of which you have visibility or control over.

It’s pronounced Gwahdec

I’ve been terrible about it, but here’s the big update — I just got back today after spending the last week at the Gran Canaria Desktop Summit, location of the first co-located GUADEC and aKademy. It’s been amazing, and I don’t know where to start. Let’s try the beginning.

The GNOME Foundation has funded a very significant part of my expense for this trip (making it possible at all), so a huge thanks to Travel Committee for giving me this opportunity. :) To summarise …

Sponsored by GNOME!

Sponsored by GNOME!

Shreyas and I reached Gran Canaria early in the morning of Day 1, but were too tired to make it to the first 2 keynotes. We woke up, had breakfast by the beach (the apartment we were in was <100 steps from the beach, and the auditorium was a 20 minute walk down the same beach — photos soon).

We did make it to Richard Stallman’s talk. It was quite generic, not surprisingly about software freedom, and nothing new to most of us. Of note were the great vitriol towards C# and the heathens who use it to create new software and a rather terrible and inappropriate attempt at humour that has been blogged about to death.

I met a huge number of people subsequently, some who’ve been at FOSS.IN before, and many whom I only knew by their online presence. The second half of the day was devoted to a number of Lightning Talks. I was pleasantly surprised to see the amount of work happening on semantic-aware projects. Good stuff.

Way to sleepy to continue making sense. More details on subsequent days, photos and so forth to come soon.

Edit: In the name of avoiding further procrastination, here are the photos.