CATEGORY: Blog

PulseAudio vs. AudioFlinger: Fight!

I’ve been meaning to try this for a while, and we’ve heard a number of requests from the community as well. Recently, I got some time here at Collabora to give it a go — that is, to get PulseAudio running on an Android device and see how it compares with Android’s AudioFlinger.

The Contenders

Let’s introduce our contenders first. For those who don’t know, PulseAudio is pretty much a de-facto standard part of the Linux audio stack. It sits on top of ALSA which provides a unified way to talk to the audio hardware and provides a number of handy features that are useful on desktops and embedded devices. I won’t rehash all of these, but this includes a nice modular framework, a bunch of power saving features, flexible routing, and lots more. PulseAudio runs as a daemon, and clients usually use the libpulse library to communicate with it.

In the other corner, we have Android’s native audio system — AudioFlinger. AudioFlinger was written from scratch for Android. It provides an API for playback/recording as well as a control mechanism for implementing policy. It does not depend on ALSA, but instead allows for a sort of HAL that vendors can implement any way they choose. Applications generally play audio via layers built on top of AudioFlinger. Even if you write a native application, it would use OpenSL ES implementation which goes through AudioFlinger. The actual service runs as a thread of the mediaserver daemon, but this is merely an implementation detail.

Note: all my comments about AudioFlinger and Android in general are based on documentation and code for Android 4.0 (Ice Cream Sandwich).

The Arena

My test-bed for the tests was the Galaxy Nexus running Android 4.0 which we shall just abbreviate to ICS. I picked ICS since it is the current platform on which Google is building, and hopefully represents the latest and greatest in AudioFlinger development. The Galaxy Nexus runs a Texas Instruments OMAP4 processor, which is also really convenient since this chip has pretty good support for running stock Linux (read on to see how useful this was).

Preparations

The first step in getting PulseAudio on Android was deciding between using the Android NDK like a regular application or integrate into the base Android system. I chose the latter — even though this was a little more work initially, it made more sense in the long run since PulseAudio really belongs to the base-system.

The next task was to get the required dependencies ported to Android. Fortunately, a lot of the ground work for this was already done by some of the awesome folks at Collabora. Derek Foreman’s androgenizer tool is incredibly handy for converting an autotools-based build to Android–friendly makefiles. With Reynaldo Verdejo and Alessandro Decina’s prior work on GStreamer for Android as a reference, things got even easier.

The most painful bit was libltdl, which we use for dynamically loading modules. Once this was done, the other dependencies were quite straightforward to port over. As a bonus, the Android source already ships an optimised version of Speex which we use for resampling, and it was easy to reuse this as well.

As I mentioned earlier, vendors can choose how they implement their audio abstraction layer. On the Galaxy Nexus, this is built on top of standard ALSA drivers, and the HAL talks to the drivers via a minimalist tinyalsa library. My first hope was to use this, but there was a whole bunch of functions missing that PulseAudio needed. The next approach was to use salsa-lib, which is a stripped down version of the ALSA library written for embedded devices. This too had some missing functions, but these were fewer and easy to implement (and are now upstream).

Now if only life were that simple. :) I got PulseAudio running on the Galaxy Nexus with salsa-lib, and even got sound out of the HDMI port. Nothing from the speakers though (they’re driven by a TI twl6040 codec). Just to verify, I decided to port the full alsa-lib and alsa-utils packages to debug what’s happening (by this time, I’m familiar enough with androgenizer for all this to be a breeze). Still no luck. Finally, with some pointers from the kind folks at TI (thanks Liam!), I got current UCM configuration files for OMAP4 boards, and some work-in-progress patches to add UCM support to PulseAudio, and after a couple of minor fixes, wham! We have output. :)

(For those who don’t know about UCM — embedded chips are quite different from desktops and expose a huge amount of functionality via ALSA mixer controls. UCM is an effort to have a standard, meaningful way for applications and users to use these.)

In production, it might be handy to write light-weight UCM support for salsa-lib or just convert the UCM configuration into PulseAudio path/profile configuration (bonus points if it’s an automated tool). For our purposes, though, just using alsa-lib is good enough.

To make the comparison fair, I wrote a simple test program that reads raw PCM S16LE data from a file and plays it via the AudioTrack interface provided by AudioFlinger or the PulseAudio Asynchronous API. Tests were run with the brightness fixed, wifi off, and USB port connected to my laptop (for adb shell access).

All tests were run with the CPU frequency pegged at 350 MHz and with 44.1 and 48 kHz samples. Five readings were recorded, and the median value was finally taken.

Round 1: CPU

First, let’s take a look at how the two compare in terms of CPU usage. The numbers below are the percentage CPU usage taken as the sum of all threads of the audio server process and the audio thread in the client application using top (which is why the granularity is limited to an integer percentage).

44.1 kHz 48 kHz
AF PA AF PA
1% 1% 2% 0%

At 44.1 kHz, the two are essentially the same. Both cases are causing resampling to occur (the native sample rate for the device is 48 kHz). Resampling is done using the Speex library, and we’re seeing minuscule amounts of CPU usage even at 350 MHz, so it’s clear that the NEON optimisations are really paying off here.

The astute reader would have noticed that since the device’ native sample rate is 48 kHz, the CPU usage for 48 kHz playback should be less than for 44.1 kHz. This is true with PulseAudio, but not with AudioFlinger! The reason for this little quirk is that AudioFlinger provides 44.1 kHz samples to the HAL (which means the stream is resampled there), and then the HAL needs to resample it again to 48 kHz to bring it to the device’ native rate. From what I can tell, this is a matter of convention with regards to what audio HALs should expect from AudioFlinger (do correct me if I’m mistaken about the rationale).

So round 1 leans slightly in favour of PulseAudio.

Round 2: Memory

Comparing the memory consumption of the server process is a bit meaningless, because the AudioFlinger daemon thread shares an address space with the rest of the mediaserver process. For the curious, the resident set size was: AudioFlinger — 6,796 KB, PulseAudio — 3,024 KB. Again, this doesn’t really mean much.

We can, however, compare the client process’ memory consumption. This is RSS in kilobytes, measured using top.

44.1 kHz 48 kHz
AF PA AF PA
2600 kB 3020 kB 2604 kB 3020 kB

The memory consumption is comparable between the two, but leans in favour of AudioFlinger.

Round 3: Power

I didn’t have access to a power monitor, so I decided to use a couple of indirect metrics to compare power utilisation. The first of these is PowerTOP, which is actually a Linux desktop tool for monitoring various power metrics. Happily, someone had already ported PowerTOP to Android. The tool reports, among other things, the number of wakeups-from-idle per second for the processor as a whole, and on a per-process basis. Since there are multiple threads involved, and PowerTOP’s per-process measurements are somewhat cryptic to add up, I used the global wakeups-from-idle per second. The “Idle” value counts the number of wakeups when nothing is happening. The actual value is very likely so high because the device is connected to my laptop in USB debugging mode (lots of wakeups from USB, and the device is prevented from going into a full sleep).

44.1 kHz 48 kHz
Idle AF PA AF PA
79.6 107.8 87.3 108.5 85.7

The second, similar, data point is the number of interrupts per second reported by vmstat. These corroborate the numbers above:

44.1 kHz 48 kHz
Idle AF PA AF PA
190 266 215 284 207

PulseAudio’s power-saving features are clearly highlighted in this comparison. AudioFlinger causes about three times the number of wakeups per second that PulseAudio does. Things might actually be worse on older hardware with less optimised drivers than the Galaxy Nexus (I’d appreciate reports from running similar tests on a Nexus S or any other device with ALSA support to confirm this).

For those of you who aren’t familiar with PulseAudio, the reason we manage to get these savings is our timer-based scheduling mode. In this mode, we fill up the hardware buffer as much as possible and go to sleep (disabling ALSA interrupts while we’re at it, if possibe). We only wake up when the buffer is nearing empty, and fill it up again. More details can be found in this old blog post by Lennart.

Round 4: Latency

I’ve only had the Galaxy Nexus to actually try this out with, but I’m pretty certain I’m not the only person seeing latency issues on Android. On the Galaxy Nexus, for example, the best latency I can get appears to be 176 ms. This is pretty high for certain types of applications, particularly ones that generate tones based on user input.

With PulseAudio, where we dynamically adjust buffering based on what clients request, I was able to drive down the total buffering to approximately 20 ms (too much lower, and we started getting dropouts). There is likely room for improvement here, and it is something on my todo list, but even out-of-the-box, we’re doing quite well.

Round 5: Features

With the hard numbers out of the way, I’d like to talk a little bit about what else PulseAudio brings to the table. In addition to a playback/record API, AudioFlinger provides mechanism for enforcing various bits of policy such as volumes and setting the “active” device amongst others. PulseAudio exposes similar functionality, some as part of the client API and the rest via the core API exposed to modules.

From SoC vendors’ perspective, it is often necessary to support both Android and standard Linux on the same chip. Being able to focus only on good quality ALSA drivers and knowing that this will ensure quality on both these systems would be a definite advantage in this case.

The current Android system leaves power management to the audio HAL. This means that each vendor needs to implement this themselves. Letting PulseAudio manage the hardware based on requested latencies and policy gives us a single point of control, greatly simplifying the task of power-management and avoiding code duplication.

There are a number of features that PulseAudio provides that can be useful in the various scenarios where Android is used. For example, we support transparently streaming audio over the network, which could be a handy way of supporting playing audio from your phone on your TV completely transparently and out-of-the-box. We also support compressed formats (AC3, DTS, etc.) which the ongoing Android-on-your-TV efforts could likely take advantage of.

Edit: As someone pointed out on LWN, I missed one thing — AudioFlinger has an effect API that we do not yet have in PulseAudio. It’s something I’d definitely like to see added to PulseAudio in the future.

Ding! Ding! Ding!

That pretty much concludes the comparison of these two audio daemons. Since the Android-side code is somewhat under-documented, I’d welcome comments from readers who are familiar with the code and history of AudioFlinger.

I’m in the process of pushing all the patches I’ve had to write to the various upstream projects. A number of these are merely build system patches to integrate with the Android build system, and I’m hoping projects are open to these. Instructions on building this code will be available on the PulseAudio Android wiki page.

For future work, it would be interesting to write a wrapper on top of PulseAudio that exposes the AudioFlinger audio and policy APIs — this would basically let us run PulseAudio as a drop-in AudioFlinger replacement. In addition, there are potential performance benefits that can be derived from using Android-specific infrastructure such as Binder (for IPC) and ashmem (for transferring audio blocks as shared memory segments, something we support on desktops using the standard Linux SHM mechanism which is not available on Android).

If you’re an OEM who is interested in this work, you can get in touch with us — details are on the Collabora website.

I hope this is useful to some of you out there!

i’m in yur analog gain, controlling it

Longish day, but I did want to post something fun before going to sleep — I just pushed out patches to hook up the WebRTC folks’ analog gain control to PulseAudio. So your mic will automatically adjust the input level based on how loud you’re speaking. It’s quite quick to adapt if you’re too loud, but a bit slow when the input signal is too soft. This isn’t bad, since the former is a much bigger problem than the latter.

Also, we’ve switched to the WebRTC canceller as the default canceller (you can still choose the Speex canceller manually, though). Overall, the quality is pretty good. I’d do a demo, but it’s effectively had zero learning time in my tests, so we’re not too far from a stage where this is a feature that, if we’re doing it right you won’t notice it exists.

There lot’s of things, big and small that need to be added and tweaked, but this does go some way towards bringing a hassle-free VoIP experience on Linux closer to reality. Once again, kudos to the folks at Google for the great work and for opening up this code. Also a shout-out to fellow Collaboran Sjoerd Simons for bouncing ideas and giving me those much-needed respites from talking to myself. :)

Notes from the Prague Audio BoFs

As I’d blogged about last week, we had a couple of Audio BoF sessions last week. Here’s a summary of what was discussed. I’ve collected items in relevance order rather than chronological order to make for easier reading. I think I have everything covered, I’ll update this post if one of the attendees points out something I missed or got wrong.

  • Surround: There were a number of topics that came up with regards to multichannel/surround support:

    • There seems to be some lack of uniformity of channel maps, particularly in non-HDA hardware. While it was agreed that this should be fixed, we need some directed testing and bug reports to actually be able to fix this.

    • Multichannel mixers, while theoretically supported, are not actually exposed by any current drivers. It was suggested that these could be exposed without breaking existing applications by having new MC mixers exposed with device names corresponding to the surround PCM device (like “surround51″).

    • We need a way to query channel maps on a given PCM. This will be implemented as a new ALSA API which could be called after the PCM is opened. (AI: Takashi)

    • It would be good to have a way to configure the channel map as well (if supported by the hardware?). The suggestion was to do this as was done in PulseAudio, where an arbitrary channel map could be specified. (NB: is there hardware that supports multiple/arbitrary channel maps? If yes, how do we handle this?)

  • Routing: Unsurprisingly, we came back to the problem of building a simplified mixer graph for PulseAudio.

    • The current status is that ALSA builds a simplified mixer for use by userspace, and PulseAudio further simplifies this by means of some name-based guessing.

    • PulseAudio would ideally like a simplified version of the original mixer graph, but something more complete than what we get now

    • However, since PulseAudio has fairly unique requirements of what information it wants, it probably makes sense to have ALSA provide the entire graph and leave the simplification task to PulseAudio (discussion on this approach continues)

    • There was no consensus on who would do this or how this should be done (creating a new serialisation format, exposing what HDA provides, adding node metadata to ALSA mixer controls, or something else altogether)

    • As an interim step, it was agreed that it would be possible to provide ordering in the simplified ALSA mixer (that is, add metadata to the control to signal what control comes “before” it and what comes “after” it). This should go some way in making the PA mixer simplification logic simpler and more robust. (AI: Takashi)

  • HDMI: A couple of things came up in discussion about the status of HDMI.

    • There was a question about the reliability of ELD information as this will be used more in future versions of PulseAudio. There did not appear to be conclusive evidence in either direction, so we will probably assume that it is reliable and deal with reliability problems as they arise.

    • It was mentioned that it might be desirable to only expose the ALSA device if a receiver is plugged in. This had been mooted earlier as something to do in PulseAudio as an alternative. One thing to consider with this approach is dealing with a device switch on the receiver side. Doing this without a notification to userspace was generally agreed to be a bad idea.

  • Jack detection: The long-standing debate on exposing jacks as input devices or ALSA controls came to a conclusion, with the resolution being that jacks would be exposed as ALSA controls. This requires a change in the kernel (and potentially alsa-lib?) which should not be too complex. Actual buttons (such as volume/mute control) will continue to be input devices. Once this is done, the pending jack detection patches will be adapted to use the new interface. (AI: Takashi (patches are in a branch already!), David)

  • UCM: Another long-standing issue was the merging of the ALSA UCM patches into PulseAudio. Most of the problems thus far had been due to an incomplete understanding of how ALSA and PA concepts mapped to each other. Some consensus was arrived at in this regard after a lengthy discussion:

    • As is the case now, every ALSA PCM maps to a PA sink

    • Each UCM verb maps to a PA card profile

    • Each combination of UCM devices that can be used concurrently maps to a PA port

    • Each UCM modifier is mapped to an intended role on the corresponding sink

    The code should (as is in the patches currently submitted) be part of the PA ALSA module, and there will be changes required to use the UCM-specified mixer list instead of PA’s guessing mechanism. (AI: ???)

    (NB: It was mentioned that PulseAudio needs to support multiple intended roles for a sink/source. This is actually already supported — the intended roles property is a whitespace-separated list of roles)

    (NB2: There was further discussion with the Linaro folks this week about the UCM bits, and there’s likely going to be an IRC/phone gathering to clarify things further in the near future)

  • GStreamer latency settings: We currently do not actually use PulseAudio’s power saving features from GStreamer for several reasons. Suggestions to over come this were mooted. While no definite agreement was reached, one suggestion was to add a “powersave” profile to pulsesink that chose higher latency/buffer-time values. Players would need to set this if they are not using features that break when these values are raised.

  • Corking: The statelessness of current the corking mechanism was discussed in one session, and between the PulseAudio developers later. The problem is that we need to be able to track cork/uncork reasons more closely. This would give us more metadata that is needed to make policy decisions without breaking streams. Particularly, for example, if PA corks a music stream due to an incoming call, then the user plays, then pauses music, and then the call ends, we must not uncork the music stream. We intend to deal with this with two changes:

    • We need to add a per-cause cork/uncork request count

    • We need to associate a “generation” with cork/uncork requests, so certain conditions (such as user intervention) can bump the generation counter, and uncork requests corresponding to old cork requests will be ignored

    This will make it possible to track the various bits of state we need to do the right thing for cases like the one mentioned before.

So that’s that — lots of things discussed, lots of things to do! Thanks to everyone who came for participating.

PulseAudio 1.1 (the echo release?)

Yep, if we keep this up, it could even become a habit!

PulseAudio 1.1 is out. It’s mostly a bunch of bug fixes on top of 1.0. Most important of these are fixes for: a libpulse dependency on libsamplerate (if enabled) which would make our LGPL license invalid, broken Skype audio capture (because we changed from a 3 number version to 2 numbers), broken startup without a DBus session bus running, and not going crazy on USB disconnects.

This should be a very safe upgrade, so grab it while it’s hot!

Alternate sample rates

I’ve just pushed a bunch of patches by Pierre-Louis Bossart that can have a pretty decent CPU/power impact. These introduce the concept of an “alternate sample rate”.

Currently, PulseAudio runs all your devices at a default sample rate, which is set to 44.1 kHz on most systems (this can be configured). All streams running at different sample rates are resampled to this sample rate. Pierre’s patches add an alternate sample rate that we try to switch to under certain circumstances if it means that we can save on resampling cost. This would happen if the stream uses exactly the alternate sample rate, or some integral-or-so multiple of it.

The default value for the alternate sample rate is 48 kHz. So if you’re playing a movie off a DVD where the audio track is typically a 48 kHz stream, and your card supports it, we switch to 48 kHz and avoid resampling altogether. Similarly, while making voice calls, common sample rates are 8, 16, and 32 kHz. These can be resampled to 48 kHz much faster than to 44.1 kHz.

Now for the big caveat — this won’t work if there’s any other stream connected to your sink/source. So if your music player is playing (or even paused) when you get that voip call, we can’t update the rate. This situation can probably be improved by at least allowing corked streams have their sample rate change (so having some random stream connected but not playing — I’m looking at you, Flash! — won’t block rate updates altogether). Hopefully we’ll get this fixed before this feature is released in PulseAudio 2.0.

Thanks to Pierre for all his work on this, and to my company, Collabora, for giving me some time for upstream work!

More conferences than you can shake a stick at

Prague is an interesting place to be at this time of the year — next week it’s playing host to the Real Time Linux Workshop. The week after that, we have the Kernel Summit, GStreamer Conference, Embedded Linux Conference Europe and LinuxCon Europe. I’m going to be at the last 3, and there’s some great audio stuff happening!

On the evening of Oct 23rd, we’re having an Audio BoF to discuss pending issues that cut across the stack — ALSA, PulseAudio, GStreamer and any other similar bits that people have complaints about.

Then there’s GstConf, where there are going to be a bunch of awesome talks. Also included is a talk by yours truly about recent developments in the PulseAudio world.

At some point during that week, possibly Oct 26th morning, plans are afoot to have an ALSA BoF to discuss the state and future of the HDA driver.

There are also rumours of excellent beer that will need to be scrupulously verified. ;)

It’s going to be a pretty exciting week!

1.w00t!

As Colin Guthrie reports, PulseAudio 1.0 is now out the door! There’s a lot of new things in the release, and we should be getting a much more regular release schedule going. Head over to the full release notes for more details.

A lot of people have contributed to this release and thanks to them all. Special props to Colin all the patch-herding, tireless help, and code ninjutsu!

p.s.: Gentoo packages are already available, of course. :)