January 16, 2012

PulseAudio vs. AudioFlinger: Fight!

I’ve been meaning to try this for a while, and we’ve heard a number of requests from the community as well. Recently, I got some time here at Collabora to give it a go — that is, to get PulseAudio running on an Android device and see how it compares with Android’s AudioFlinger.

The Contenders

Let’s introduce our contenders first. For those who don’t know, PulseAudio is pretty much a de-facto standard part of the Linux audio stack. It sits on top of ALSA which provides a unified way to talk to the audio hardware and provides a number of handy features that are useful on desktops and embedded devices. I won’t rehash all of these, but this includes a nice modular framework, a bunch of power saving features, flexible routing, and lots more. PulseAudio runs as a daemon, and clients usually use the libpulse library to communicate with it.

In the other corner, we have Android’s native audio system — AudioFlinger. AudioFlinger was written from scratch for Android. It provides an API for playback/recording as well as a control mechanism for implementing policy. It does not depend on ALSA, but instead allows for a sort of HAL that vendors can implement any way they choose. Applications generally play audio via layers built on top of AudioFlinger. Even if you write a native application, it would use OpenSL ES implementation which goes through AudioFlinger. The actual service runs as a thread of the mediaserver daemon, but this is merely an implementation detail.

Note: all my comments about AudioFlinger and Android in general are based on documentation and code for Android 4.0 (Ice Cream Sandwich).

The Arena

My test-bed for the tests was the Galaxy Nexus running Android 4.0 which we shall just abbreviate to ICS. I picked ICS since it is the current platform on which Google is building, and hopefully represents the latest and greatest in AudioFlinger development. The Galaxy Nexus runs a Texas Instruments OMAP4 processor, which is also really convenient since this chip has pretty good support for running stock Linux (read on to see how useful this was).

Preparations

The first step in getting PulseAudio on Android was deciding between using the Android NDK like a regular application or integrate into the base Android system. I chose the latter — even though this was a little more work initially, it made more sense in the long run since PulseAudio really belongs to the base-system.

The next task was to get the required dependencies ported to Android. Fortunately, a lot of the ground work for this was already done by some of the awesome folks at Collabora. Derek Foreman’s androgenizer tool is incredibly handy for converting an autotools-based build to Android–friendly makefiles. With Reynaldo Verdejo and Alessandro Decina’s prior work on GStreamer for Android as a reference, things got even easier.

The most painful bit was libltdl, which we use for dynamically loading modules. Once this was done, the other dependencies were quite straightforward to port over. As a bonus, the Android source already ships an optimised version of Speex which we use for resampling, and it was easy to reuse this as well.

As I mentioned earlier, vendors can choose how they implement their audio abstraction layer. On the Galaxy Nexus, this is built on top of standard ALSA drivers, and the HAL talks to the drivers via a minimalist tinyalsa library. My first hope was to use this, but there was a whole bunch of functions missing that PulseAudio needed. The next approach was to use salsa-lib, which is a stripped down version of the ALSA library written for embedded devices. This too had some missing functions, but these were fewer and easy to implement (and are now upstream).

Now if only life were that simple. :) I got PulseAudio running on the Galaxy Nexus with salsa-lib, and even got sound out of the HDMI port. Nothing from the speakers though (they’re driven by a TI twl6040 codec). Just to verify, I decided to port the full alsa-lib and alsa-utils packages to debug what’s happening (by this time, I’m familiar enough with androgenizer for all this to be a breeze). Still no luck. Finally, with some pointers from the kind folks at TI (thanks Liam!), I got current UCM configuration files for OMAP4 boards, and some work-in-progress patches to add UCM support to PulseAudio, and after a couple of minor fixes, wham! We have output. :)

(For those who don’t know about UCM — embedded chips are quite different from desktops and expose a huge amount of functionality via ALSA mixer controls. UCM is an effort to have a standard, meaningful way for applications and users to use these.)

In production, it might be handy to write light-weight UCM support for salsa-lib or just convert the UCM configuration into PulseAudio path/profile configuration (bonus points if it’s an automated tool). For our purposes, though, just using alsa-lib is good enough.

To make the comparison fair, I wrote a simple test program that reads raw PCM S16LE data from a file and plays it via the AudioTrack interface provided by AudioFlinger or the PulseAudio Asynchronous API. Tests were run with the brightness fixed, wifi off, and USB port connected to my laptop (for adb shell access).

All tests were run with the CPU frequency pegged at 350 MHz and with 44.1 and 48 kHz samples. Five readings were recorded, and the median value was finally taken.

Round 1: CPU

First, let’s take a look at how the two compare in terms of CPU usage. The numbers below are the percentage CPU usage taken as the sum of all threads of the audio server process and the audio thread in the client application using top (which is why the granularity is limited to an integer percentage).

44.1 kHz		48 kHz
AF	PA	AF	PA
1%	1%	2%	0%

At 44.1 kHz, the two are essentially the same. Both cases are causing resampling to occur (the native sample rate for the device is 48 kHz). Resampling is done using the Speex library, and we’re seeing minuscule amounts of CPU usage even at 350 MHz, so it’s clear that the NEON optimisations are really paying off here.

The astute reader would have noticed that since the device’ native sample rate is 48 kHz, the CPU usage for 48 kHz playback should be less than for 44.1 kHz. This is true with PulseAudio, but not with AudioFlinger! The reason for this little quirk is that AudioFlinger provides 44.1 kHz samples to the HAL (which means the stream is resampled there), and then the HAL needs to resample it again to 48 kHz to bring it to the device’ native rate. From what I can tell, this is a matter of convention with regards to what audio HALs should expect from AudioFlinger (do correct me if I’m mistaken about the rationale).

So round 1 leans slightly in favour of PulseAudio.

Round 2: Memory

Comparing the memory consumption of the server process is a bit meaningless, because the AudioFlinger daemon thread shares an address space with the rest of the mediaserver process. For the curious, the resident set size was: AudioFlinger — 6,796 KB, PulseAudio — 3,024 KB. Again, this doesn’t really mean much.

We can, however, compare the client process’ memory consumption. This is RSS in kilobytes, measured using top.

44.1 kHz		48 kHz
AF	PA	AF	PA
2600 kB	3020 kB	2604 kB	3020 kB

The memory consumption is comparable between the two, but leans in favour of AudioFlinger.

Round 3: Power

I didn’t have access to a power monitor, so I decided to use a couple of indirect metrics to compare power utilisation. The first of these is PowerTOP, which is actually a Linux desktop tool for monitoring various power metrics. Happily, someone had already ported PowerTOP to Android. The tool reports, among other things, the number of wakeups-from-idle per second for the processor as a whole, and on a per-process basis. Since there are multiple threads involved, and PowerTOP’s per-process measurements are somewhat cryptic to add up, I used the global wakeups-from-idle per second. The “Idle” value counts the number of wakeups when nothing is happening. The actual value is very likely so high because the device is connected to my laptop in USB debugging mode (lots of wakeups from USB, and the device is prevented from going into a full sleep).

	44.1 kHz		48 kHz
Idle	AF	PA	AF	PA
79.6	107.8	87.3	108.5	85.7

The second, similar, data point is the number of interrupts per second reported by vmstat. These corroborate the numbers above:

	44.1 kHz		48 kHz
Idle	AF	PA	AF	PA
190	266	215	284	207

PulseAudio’s power-saving features are clearly highlighted in this comparison. AudioFlinger causes about three times the number of wakeups per second that PulseAudio does. Things might actually be worse on older hardware with less optimised drivers than the Galaxy Nexus (I’d appreciate reports from running similar tests on a Nexus S or any other device with ALSA support to confirm this).

For those of you who aren’t familiar with PulseAudio, the reason we manage to get these savings is our timer-based scheduling mode. In this mode, we fill up the hardware buffer as much as possible and go to sleep (disabling ALSA interrupts while we’re at it, if possibe). We only wake up when the buffer is nearing empty, and fill it up again. More details can be found in this old blog post by Lennart.

Round 4: Latency

I’ve only had the Galaxy Nexus to actually try this out with, but I’m pretty certain I’m not the only person seeing latency issues on Android. On the Galaxy Nexus, for example, the best latency I can get appears to be 176 ms. This is pretty high for certain types of applications, particularly ones that generate tones based on user input.

With PulseAudio, where we dynamically adjust buffering based on what clients request, I was able to drive down the total buffering to approximately 20 ms (too much lower, and we started getting dropouts). There is likely room for improvement here, and it is something on my todo list, but even out-of-the-box, we’re doing quite well.

Round 5: Features

With the hard numbers out of the way, I’d like to talk a little bit about what else PulseAudio brings to the table. In addition to a playback/record API, AudioFlinger provides mechanism for enforcing various bits of policy such as volumes and setting the “active” device amongst others. PulseAudio exposes similar functionality, some as part of the client API and the rest via the core API exposed to modules.

From SoC vendors’ perspective, it is often necessary to support both Android and standard Linux on the same chip. Being able to focus only on good quality ALSA drivers and knowing that this will ensure quality on both these systems would be a definite advantage in this case.

The current Android system leaves power management to the audio HAL. This means that each vendor needs to implement this themselves. Letting PulseAudio manage the hardware based on requested latencies and policy gives us a single point of control, greatly simplifying the task of power-management and avoiding code duplication.

There are a number of features that PulseAudio provides that can be useful in the various scenarios where Android is used. For example, we support transparently streaming audio over the network, which could be a handy way of supporting playing audio from your phone on your TV completely transparently and out-of-the-box. We also support compressed formats (AC3, DTS, etc.) which the ongoing Android-on-your-TV efforts could likely take advantage of.

Edit: As someone pointed out on LWN, I missed one thing — AudioFlinger has an effect API that we do not yet have in PulseAudio. It’s something I’d definitely like to see added to PulseAudio in the future.

Ding! Ding! Ding!

That pretty much concludes the comparison of these two audio daemons. Since the Android-side code is somewhat under-documented, I’d welcome comments from readers who are familiar with the code and history of AudioFlinger.

I’m in the process of pushing all the patches I’ve had to write to the various upstream projects. A number of these are merely build system patches to integrate with the Android build system, and I’m hoping projects are open to these. Instructions on building this code will be available on the PulseAudio Android wiki page.

For future work, it would be interesting to write a wrapper on top of PulseAudio that exposes the AudioFlinger audio and policy APIs — this would basically let us run PulseAudio as a drop-in AudioFlinger replacement. In addition, there are potential performance benefits that can be derived from using Android-specific infrastructure such as Binder (for IPC) and ashmem (for transferring audio blocks as shared memory segments, something we support on desktops using the standard Linux SHM mechanism which is not available on Android).

If you’re an OEM who is interested in this work, you can get in touch with us — details are on the Collabora website.

I hope this is useful to some of you out there!

Post meta

AuthorArun

PostedJanuary 16, 2012 — 5:52 pm

CategoriesBlog

Tagsandroid, collabora, f/oss, linux, pulseaudio, work

85 Comments

Add yours

Adam Williamson

January 16, 2012 — 11:21 pm

The lack of resampling of 48KHz files isn’t just good for CPU use, it’s good for audio snobs too. Of course, most music is 44.1KHz anyway, but there are some oddball 48KHz releases out there (I have some Japanese digital downloads which come at 48KHz).

Reply
D

January 16, 2012 — 11:37 pm

I did the same test with my netbook, the result was 0.4 watt less power consumption: http://linux-tipps.blogspot.com/2011/04/power-performance-of-pulseaudio-alsa.html. Could you also test how fast the battery runs out with the two methods playing a loopback?

Reply
- Arun
  
  January 19, 2012 — 9:09 am
  
  Indeed, this is something I’m hoping to get some time to try as well. Will report details when I do.
  
  Reply
Manuel

January 17, 2012 — 12:53 am

Nice to see that audio latency drop with PulseAudio, i was wondering, what was the samplerate when you measured 20ms? Also, what a bad thing is the 48-44.1-48 kHz thing? I just mean… WOW! Upsampling a downsampled sample? Is this a joke?!

Reply
- Arun
  
  January 20, 2012 — 9:47 am
  
  The sample rate isn’t too important — I can happily get this latency with 44.1 or 48 kHz.
  
  Reply
Mark Brown

January 17, 2012 — 1:29 am

I believe the use of 44.1kHz as the native format probably comes from someone assuming they’d never have to deal with hardware at any other rate as a quick thing during development rather than a carefully considered decision. For the most part this is actually a fairly reasonable decision, 48kHz hardware is relatively unusual in this space given that the overwhelming majority of audio devices like phones play from the CPU is 44.1kHz based.

Reply
- random PA user
  
  January 17, 2012 — 4:38 am
  
  Nowadays 48 kHz is starting to catch on but yes usually audio data sample rate is 44.1 kHz BUT if I’m not mistaken audio hardware including Intel HDA are almost always designed for 48 kHz.
  
  Reply
  - Ronald McMacdonald
    
    January 17, 2012 — 12:37 pm
    
    This isn’t true. Only the worst, cheapest audio devices only support 48 kHz. Modern devices use a PLL to generate a wide range of audio sample clocks. AFAICT, the TWL6040 (which is fairly cheapo) does support 44.1 kHz output, but only under certain conditions – probably whoever wrote the driver/HAL couldn’t be bothered to handle this properly.
    
    Reply
    - Mark Brown
      
      January 18, 2012 — 7:30 pm
      
      That’s not really true – there are a lot of devices out there which only have a single digital clock domain. In order to support interoperation with digital basebands (which operate at 8kHz and multiples thereof) they need to run the high rate audio at 48kHz rather than 44.1kHz even if they could also run at 44.1kHz.
      
      More flexible devices support multiple digital sample rates but there’s a cost there due to the need to do asynchronous sample rate conversion between the different domains
      
      Reply
  - Mark Brown
    
    January 18, 2012 — 7:33 pm
    
    The restrictions here usually come from other things you have to play with rather than the hardware itself. For example, anything that talks to telephony will have to be in an 8kHz based domain as that’s what the telephony world uses and HDMI tends to be restricted by the requirements of the video part of things.
    
    HDA itself is very flexible.
    
    Reply
Felipe Contreras

January 17, 2012 — 5:05 am

That’s nice, but how much of this performance difference comes from PulseAudio, and how much from ALSA?

Also, without looking at the actual code, one cannot make any conclusions. Maybe AudioFlinger API was misused? Impossible to tell.

Reply
- Arun
  
  January 17, 2012 — 10:33 am
  
  Both PulseAudio and AudioFlinger are using ALSA, so that bit is common. I’ll be posting all the code, including the test code for scrutiny as well.
  
  Reply
  - Felipe Contreras
    
    January 17, 2012 — 6:55 pm
    
    That wasn’t clear from your post, because you said “It does not depend on ALSA”. Anyway, the fact that AudioFlinger uses ALSA doesn’t mean it’s being used properly.
    
    Reply
    - Arun
      
      January 17, 2012 — 7:24 pm
      
      I thought it was clear — AudioFlinger uses a vendor-specific audio HAL and in the case of the Galaxy Nexus, the HAL uses ALSA. The usage is fairly straightforward and correct from my reading of the code.
      
      Reply
      - marko
        
        March 20, 2012 — 9:02 pm
        
        Which underlines the issue at hand: If you want intend to write software that works across all Android devices, you can’t assume you have ALSA sitting beneath AudioFlinger.
        
        Reply
Ikem

January 17, 2012 — 6:59 am

I see a need for an AudioFlinger <> Pulse Audio wrapper.

Reply
Dave

January 17, 2012 — 2:47 pm

I’d love to see a similar comparison for Jack. Then we could really start to see use of Android for music composition, recording, etc. Although 20ms is getting to the right sort of ballpark.

Reply
Daniel Thomspon

January 17, 2012 — 3:38 pm

I noticed you made a choice early in your work to build pulseaudio into the core of Android rather than using NDK.

Does that imply that an Android app. with native code could seek to target ALSA directly in order to get low latency access to the hardware?

I’d really love to port guitarix to my phone… although admittedly I’d probably have to figure out how to power a hi-Z buffer from the headset socket first.

Reply
- Arun
  
  January 19, 2012 — 9:08 am
  
  You can’t really target ALSA, because then you’d end up contending with AudioFlinger for access to the ALSA device. And of course, this would only work on devices that actually use ALSA in the audio HAL.
  
  Reply
  - Daniel Thomspon
    
    February 7, 2012 — 10:22 pm
    
    Thanks for the reply. I knew they would compete… I just hoped ALSA lib might win if I have audio focus but don’t start up any audio tracks.
    
    Reply
    - Eero Tamminen
      
      November 30, 2012 — 3:13 am
      
      At least in Maemo devices, pulseaudio did a lot of heavy audio processing that was missing from the lower ALSA layer. Some of these things were sound amplification and speaker protection. If you were using ALSA directly, you could break the speaker although volume sounded lower…
      
      Reply
sam

January 17, 2012 — 4:07 pm

@ARUN Thank you so much for your work. Really appreciated. I requested the port on bugzilla and now I get it. Can’t wait to give it a try.

This will be a perfect match for my pulseaudio custom mod for WR703N to form a $22 multi-room wireless audio solutions.

https://forum.openwrt.org/viewtopic.php?pid=150596

Reply
Vinod Koul

January 17, 2012 — 6:42 pm
few points: – In ICS, Flinger has added notion of dynamic buffer durations, so based on usage (low power) it can goto longer duration buffers. I am not sure if that is enabled for Nexus or not…
- on sample rate of 44.1, i believe the OMAP DSP (http://omappedia.org/wiki/Audio_Drive_Arch) also does voice processing. And it is easier to fix the DSP sample rate to one value, so my guess is they chose to fix at 48 and convert media stream to 48 always.
Reply
- Arun
  
  January 17, 2012 — 7:23 pm
  
  The code for that looks something like this:
  
  if (screen is on) latency = 22ms; else latency = 308ms;
  
  1
  2
  3
  4
  
  if (screen is on)
  latency = 22ms;
  else
  latency = 308ms;
  
  I made sure the screen was off (and verified that the additional buffering was happening) for my tests.
  
  Reply
  - Leander Faux
    
    January 19, 2012 — 1:59 am
    
    Would you mind pointing me to that code?
    
    Reply
    - Arun
      
      January 19, 2012 — 9:05 am
      
      Here you go — https://github.com/CyanogenMod/android_device_samsung_tuna/blob/ics/audio/audio_hw.c#L2287
      
      The actual code that sets up the latencies is in the out_write() function.
      
      Reply
    - Arun
      
      January 19, 2012 — 12:23 pm
      
      I misstated the numbers there, though. You can see the period size and number of periods there (not including additional buffering in AudioFlinger) — 22ms * 4 = 88ms when the screen is on, and 308ms * 2 = 616ms when the screen is off.
      
      Reply
ching

January 18, 2012 — 9:43 am
To be safe, i guess you should test the following too:
1. CPU load and power consumption when pulseaudio is serving “low-latency” client(s).
2. worst latency encountered and their distribution when serving “low-latency” client(s).
3. condition 2 + CPU under 100% loading.
P.S. I hate PA and i always perfer using OSS4/ALSA for lowest latency and cpu comsumption
Reply
- Arun
  
  January 19, 2012 — 9:07 am
  
  Well, this was mostly a comparison with AudioFlinger, and since that didn’t go below 176ms on this device, I didn’t do a more detailed comparison at low latency. The actual CPU usage didn’t shoot up by a lot though.
  
  As for 100% CPU loading — even on the desktop, we grant real-time priority to the audio process, so the rest of your system is going to be pretty unresponsive before audio starts dropping.
  
  Reply
  - ching
    
    January 19, 2012 — 11:51 am
    
    The 3 test cases represent daily use scenario:
    
    CPU load and power consumption when pulseaudio is serving “low-latency” client(s). – VoIP software
    
    worst latency encountered and their distribution when serving “low-latency” client(s). – VoIP software, Game and playing high quality movie
    
    condition 2 + CPU under 100% loading. – Game and playing high quality movie
    
    On my old PC, PA will use about 8% CPU when i play music, the CPU will be 0% if using Alsa directly. (my card supports hardware mixing…) –test 1
    
    I try to conference with 5 people in Skype, PA uses up to 20% CPU (around 4% increase for each people, i guest its sound mixing algorithm is O(n)), and audio begin to glitches. I cannot imagine what i happen if i add a few more people –test 2
    
    My Game simple have latency and glitches. The situation is even worse when i play games under virtual machine. — test 3
    
    My problems are gone after purging PA.
    
    I think the sound server must function well under “severe” environment, there are hundreds of “hungry” Android application waiting ahead.
    
    “As for 100% CPU loading — even on the desktop, we grant real-time priority to the audio process, so the rest of your system is going to be pretty unresponsive before audio starts dropping.” –> For gaming, this is crazy
    
    Reply
    - Arun
      
      January 20, 2012 — 9:52 am
      
      For test1 (and test2) — I’d be interested to get some more details, such as the kernel and PulseAudio versions, as well as the CPU frequency. top numbers are a percentage of current CPU frequency, so (just hypothetically) if your CPU goes down to 100 MHz, 8% might not be that bad. But this could also be a problem with bad drivers (causing more frequent wakeups in PA than they should), which would also affect your test3.
      
      If you’re up to it, do file a bug and we can try to figure out what’s happening (https://bugs.freedesktop.org/enter_bug.cgi?product=PulseAudio).
      
      Also, as for hungry Android applications — there aren’t that many now, given the state of AudioFlinger latency.
      
      Reply
      - ching
        
        January 20, 2012 — 2:20 pm
        
        My point is just the following:
        
        Audio should have low latency, even CPU is under full load by other program. Audio should not consume much power, even when serving low latency clients. more tests are needed to prove PA a solution for latency, quality control is important
        
        maybe my expection is too high :)
        
        Reply
        
        liam
        
        January 21, 2012 — 4:37 pm
        
        i believe he already explained to you as best he could without more specifics.PA requires rt-kit so it should rarely drop out due to high loads (this should be even more true in android with its preempt patch). Also, as he demonstrated, PA doesn’t have to use vqst amounts of cpu time. His example showed very low cpu usage (apparently lower on hie arm chip than your x86!) under optimal conditions FOR BOTH. If nothing else this should demonstrate the superior latency and power draw of PA compared to AF. BTW, what was the bug number of the report he asked you to submit? Im curious as to the causes myself.
        
        Reply
        
        ching
        
        January 23, 2012 — 5:56 pm
        
        i am just suggesting more tests on latency of PA. If you think Quality assurance is not important, please ignore all my word.
        
        Reply
        
        liam
        
        January 26, 2012 — 11:05 am
        
        Obviously I am not saying QA isn’t important, but that isn’t really the question here. OTOH if QA is important to you you would’ve filed the bug report he asked for, but, as you said, you don’t like PA I can see why you have little interest in actually improving it (i.e., filing bug reports).
        
        Reply
        
        itsme
        
        May 21, 2012 — 6:28 pm
        
        I think I understand what he wanted to say. He (and it’s also my opinion) thinks, that adding another layer of audio processing (PulseAudio) isn’t needed and only adds more overhead/latency. Since this does run only on phones which use ALSA, why not use ALSA directly?
        
        Reply
Konrad

January 24, 2012 — 6:54 pm

That’s impressive, good luck, maybe you’re the chosen one to free Android market from low latency issues :<

Reply
liam

January 26, 2012 — 11:26 am

Hi Arun,

In the interview you gave recently to Christian Schaller, you mentioned that you think that PA and JACK serve mutually exclusive purposes. Why is that the case? It seems as though you can have a sound server that serves both purposes (low power and latency) since Windows and Mac have done this (and PA was based off of CoreAudio, IIRC). Naively, I would think that unless PA has some inherent pathologies, you could set the policy such that it tries for maximum power savings (up to 2 sec buffer, IIRC, and depending on the hardware) but is overridden when a client requests low latency and is active. Since this is basically how things currently seem to work, there is obviously a hardware independent floor (meaning, aside from the drivers, and other systems) in how far down latency can be pushed with PA. Why is that?

Best/Liam

Reply
- Arun
  
  January 26, 2012 — 11:39 am
  
  Hey Liam.
  
  So the short story is that there are very likely optimisations to PulseAudio that can get our latency down by quite a bit. I’m hoping to look at this soon, time permitting. That said, JACK (at least as far as the literature says) can do something in the order of 1.33ms. They manage to do this by setting up a dedicated SHM pipeline practically from the app to the hardware, AIUI. Given the PA architecture, this is a lot harder to do. So we can likely bring down our latency floor to something that should be “good enough” for a number of latency-sensitive application, but getting to what JACK can do is a lot harder.
  
  Cheers.
  
  Reply
  - liam
    
    January 27, 2012 — 5:43 am
    
    Thanks for the explanation, Arun. Since JACK wants exclusive access to the device that seems to preclude easy co-operation with PA. Too bad. Hopefully you and the rest of the PA devs will be able to figure out the latency problems b/c, unless there is either a fundamental issue with PA (which you didn’t seem to indicate there was), or a problem with the kernel (assuming pre-emption patches at a minimum), Linux should be able to provide the same functionality from a single sound daemon as Win/Mac, or, failing that, a transparent co-operative mix of JACK/PA. Yeah, I know, easier said:)
    
    Best/Liam
    
    Reply
    - Gabriel Beddingfield
      
      January 29, 2012 — 1:36 am
      
      There’s two things that make JACK awesome:
      
      It allows applications to share audio data. So you could chain a synth application into a standalone filter/wah and then into a multi-track recording app and then monitor out the speakers.
      
      A fundamental part of JACK is that the applications must be written real-time safe in the audio callback, and the the callback runs with real-time priority (SCHED_FIFO).
      
      1 is irrelevant on Android. Not only is it tedious to set up something like this on a mobile device… I get the impression that Google discourages this kind of interaction between applications.
      
      Likewise, for #2: Android will ever allow an app to run with SCHED_FIFO. While the callback is being called, the application will have uninterrupted access to the CPU. For Pro-Audio this is filtered out by a discerning musician who chooses apps he can trust. On Android, someone can download a buggy app that totally locks up their phone. (Response: “Android sucks, bro!”)
      
      Without SCHED_FIFO, there’s no point in using JACK. Using “nice” is not sufficient.
      
      And then there’s power. JACK doesn’t care about power consumption. Period. It’s primary purpose is performance, and gets very unhappy if things aren’t performing. (Are you late coming back from a callback? That’s Game Over in JACK world. The audience just heard a pop over the public address and are not very impressed.)
      
      FWIW, I’ve made PA run on top of JACK for a MeeGo device. (Requires Jack2 and dbus.) It works OK… but audio performance (not power) was the priority for that device. Also, patching full-screen audio apps into an audio patch wasn’t terribly fun (as opposed to laying out the windows on your desktop and editing the patch).
      
      Reply
    - Daniel Thomspon
      
      February 7, 2012 — 10:20 pm
      
      Merging jack and PA is not so bad today.
      
      It is possible to set up the stack with jack at the bottom and PA running on top of jack.
      
      Even better because PA supports switches in output target at runtime you can let it use the hardware directly, suspend it, insert jack and let it go again. At this point PA will automatically switch to jack output (and switch back to raw hardware when jackd is killed).
      
      Its not entirely seamless (short muting period) but a good compromise for switch between high-power-low-latency audio (a.k.a. jack) and low-power-demand-led-latency (a.k.a. pulseaudio).
      
      Reply
- Colin Guthrie
  
  February 2, 2012 — 5:55 pm
  
  Hiya Liam,
  
  Just a couple extra points on Arun’s reply.
  
  It was my understanding that CoreAudio is not used on iOS? At least not the same incarnation as on the desktop space (please correct me if I’m wrong). I was always under the impression that CoreAudio served the Desktop and Pro uses cases well, but not so much the Mobile. PulseAudio on the other hand serves the Mobile and the Desktop use cases well but not the Pro.
  
  As for the really, really low latency cases (i.e. Pro), I think PulseAudio’s protocol may ultimately reach an event horizon down at the 2ms like latencies… as we try and use SHM for data transfer between client and server, we pass around handles to blocks of memory that contain audio data rather than passing the audio data itself. For sufficiently large buffers this is obviously the most efficient approach but as soon as you get down to really small payloads, the metadata becomes bigger than the data itself (or at least approaches it) and thus the efficient savings that were there previously actually become overheads.
  
  So with this in mind I think I have to agree with Arun and say that JACK and PA are tailored to mutually exclusive use cases.
  
  With the DBus Device Reservation Protocol supported by PA, JACK can now gracefully ask for exclusive access to the device which at least eases the pain. We can also automatically load jack sink modules into PA to allow sound to carry on playing (assuming it’s patched in JACK) when this happens. So I think cooperation is the way to go here.
  
  That said I fully support any work Arun or anyone else can do to reduce PA latency further for the cases that need it (e.g. games and voip), provided this does not damage the large latency benefits for e.g. media playback :)
  
  Cheers
  
  Col
  
  Reply
  - liam
    
    February 4, 2012 — 4:27 am
    
    Hi Colin,
    
    From the Apple dev docs ( https://developer.apple.com/library/ios/#documentation/MusicAudio/Conceptual/CoreAudioOverview/WhatisCoreAudio/WhatisCoreAudio.html#//apple_ref/doc/uid/TP40003577-CH3-SW17 ) they do use CoreAudio, but it has some differences from OSX’s CA (unable to create custom codecs/audio units). I assume that adding a simple switch/case wouldn’t be feasible? I really end up going back to OSX and Windows: they have a single sound server that provides for all use cases, so clearly it is possible to design a system that works everywhere. Besides, if you can get it down to 2ms (or even 5 or 7, for that matter) I don’t think anyone is going to complain about that being too large:) Having better cooperation between JACK/PA would be fine as long as it is pretty transparent to the user (namely, install JACK with PA by default and enable JACK when required by an application). Obviously there would need to be more involved, but the goal of a “single” solution seems like it should be sought after.
    
    Thanks for the response, Colin.
    
    Best/Liam
    
    Reply
liam

January 29, 2012 — 3:47 pm

Really nice explanations, Arun. I realised NICE is pretty useless as a mechanism for latency, hence why android makes extensive use of cgroups (I believe they employ three categories). An interesting feature from the 3.2 kernel is a notion of cfs scheduling (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ab84d31e15502fb626169ba2663381e34bf965b2) via cgroups. This could be of use to help “guarantee” certain applications don’t get relatively starved. Naturally, this would require a new cgroup type in Android that takes advantage of this, but considering that this is still marked experimental, I’m not sure they are even looking at it. Regarding those two features of JACK, they actually seem like something the musicians would like. Currently, they wouldn’t the first isn’t of much use since it doesn’t do a great job multitasking, but being able to string together (Gstreamer-like!) audio apps would go a long ways towards making portable devices really useful instruments, but, as you said, that is not now. The second feature actually seems commensurate with the kernels new scheduling ability (via the aformentioned cfs scheduling), and since android already employs the pre-emptive kernel they COULD expose this and at least give the users the option to run a low latency dj app, or whatever. It’s not as if we don’t currently have problem with apps sitting on the cpu refusing to budge until the kernel throws them off:) Besides, the preempt kernel should be able to throw off most misbehaving apps. The power part is, obviously, why PA is so nice, so, as I’ve mentioned, hopefully the remaining PA optimisations (along with making use of new kernel features) will make it “good enough” to use as a general purpose sound server (meaning able to offer, say, 5ms latency, when needed, and when not, 1000ms, or so). Lastly, regarding PA-on-JACK, was that a proof of concept, or was it an attempt to actually provide a low latency solution whilst still using PA? I’m guessing, since this was a Meego device, that it was some kind of handheld, thus power was at a premium. IOW, I’m guessing the goal wasn’t to run PA over JACK all the time, but only when needed (something like:kill PA->replace with JACK->run PA on JACK, then when not needed, kill JACK->start PA->cross fingers). Did it work? Were you able to achieve the performance required? This is really interesting. You should consider, when you have the time, writing another post explaining your experiences, and where you think PA could improve in order to become more useful for some niche apps.

Best/Liam

Reply
- Gabriel Beddingfield
  
  January 29, 2012 — 9:40 pm
  
  Liam,
  
  I think you’re replying to me… not Arun…. so I’ll respond. :-)
  
  Yes, I agree that CGroups is a possible fix for the RT-issue. ATM the complication is identifying which apps are JACK apps and should be brought into the audio/jack cgroup. I recall that cgroups are assigned on a per-process basis and there has to be a controller somewhere that sorts out the processes. I recall that Paul and Torben did some work to add this to the jack server… but I don’t know what the status of that is.
  
  As for the MeeGo device, it was the Indamixx 2. Here’s an engadget video: http://www.engadget.com/2011/05/25/meego-conference-2011-sights-and-sounds-video/
  
  The reason for PA+JACK was simple: MeeGo required PulseAudio to be MeeGo compliant. Also, it’s a decent alternative to the ALSA/JACK plugin so that non-JACK apps can use the JACK server. The objective was to run JACK all the time, so it was always PA-over-JACK. After I got it set up, I only messed with PA if there was a problem. Using JACK was the point of the device.
  
  As for performance, yes we were able to meet our goals by using JACK and a tuned (and relatively un-patched) kernel. Because of PA and some of the heavy-weight shrink-wrap audio apps… the latency out of the box was set to 20 ms. It could do about 5-10 ms if you killed PA and used some well-written audio apps.
  
  -gabriel
  
  Reply
liam

February 1, 2012 — 1:46 pm

Gabriel (!),

Yikes, sorry about the using the wrong name. I didn’t even notice the name HAD changed, to be honest. Sorry:) I’m glad you think cgroups might help. I have been having a conversation elsewhere about hybrid kernels, and the topic of XNU’s scheduler came up. He thought that he’d read that in the 10.7 kernel Apple had added the the ability to place the scheduler in user space. Naturally I did a bit of research but I was unable to find any documentation to support his claim (he said he would try and find it, so hopefully I’m wrong and it is available b/c I am curious as to what they had in mind) but I did find that Apple groups threads into 4 categories: Normal, System high priority, kernel-mode only, and Real-time (this mode seems to use the same semantics as the new cfs scheduler, which is suggestive to me — here’s the url if you are interested: http://developer.apple.com/library/mac/#documentation/Darwin/Conceptual/KernelProgramming/scheduler/scheduler.html#//apple_ref/doc/uid/TP30000905-CH211-BEHJDFCA ) . This is pretty similar to the way Android works, and seems like an “easy” fit for Linux, in general. Currently, as I understand it, we use rtkit to boost priority when asked, and if allowed by system policy. So the burden would be on the app devs to support this (which they must do anyway to make use of rtkit). So, in short, the apps would tell need to tell us. I’m assuming, however, things aren’t as simple as I understand them. As for managing the cgroups, I’m not sure I see the problem. Assuming we already have a JACK (or whatever) bin, we can migrate a thread into a cgroup. I suppose you might make a slight tweak to rtkit to enable it to write to the mounted tasks file in /proc/cgroups/JACK/. This might be a bit of a security issue, however:) Thanks for the Indamixx2 link. The latency seemed excellent on the ivory app (I think that is what it was called). It was harder to tell on mixx, but that was simply b/c of the lack of sharp gestures the presenter was using, as opposed to the stabbing motions used when striking a key. So, it is at least nice to know PA can get low enough to be not noticeable (to me, at least). Out of curiosity, did you happen to measure latency when using just PA itself on the tablet? It seems a bit strange that PA simply residing on JACK would increase latency, especially if PA isn’t doing any routing. That is, I wonder how much latency was due to the apps themselves (the non-well optimized ones, that is), versus the presence of PA.

Best/Liam

Reply
sam

February 8, 2012 — 4:51 pm

Arun,

The pulseaudio website seems to be down. Could you send me the build instruction via Email?

Reply
Ross Bencina

February 13, 2012 — 9:32 am

All research and attention to the Android audio stack is a good thing and I reckon this post is great for exposing some shadowy details.

(But* I’m going to stick my neck out here: Seems to me, like AudioFlinger, PulseAudio was never designed for low latency. From what people tell me, PulseAudio attracts many of the same criticisms that AudioFlinger does (high latency, unsuitable for serious audio work).

End-users like PulseAudio because it provides a nice user experience but unlike CoreAudio or JACK, it provides crappy latency — because as far as I can tell, the PulseAudio team never focused on latency as a core requirement.

To give you an idea, CoreAudio can easily work with latency/ buffer sizes of 2ms or less. I’m not even sure a a CoreAudio-like architecture is viable on the Linux kernel, because CoreAudio appears to depend not just on low-jitter real-time thread priorities, but also on low-jitter absolute time wakeup (mach_wait_until API). This means you need not only real-time priorities and response to external events, but also low-jitter timed wait. Not sure if the Linux kernel has that, the Android kernel likely doesn’t.

There is a separate debate about whether a decoupled timer based scheduler is a better architecture than an interrupt scheduled one like JACK, ASIO etc. For extremely low latency scenarios, in tests of ASIO vs. CoreAudio with the same hardware, the interrupt driven mode (ASIO) seems to perform better — perhaps because it is limited only by the latency in scheduling soundcard interrupts, not also by timer jitter. On the other hand, the decoupled timer based approach might have better behaviour with regard to prioritising resources (?).

In my view it would be at best a step sideways to make PulseAudio a key component of the Android stack. I think the best way forward is to re-write AudioFlinger. It’s not that big and clearly there is room for improvement.

Reply
- Arun
  
  February 14, 2012 — 10:03 am
  
  My understanding is that iOS doesn’t actually share the same CoreAudio implementation as OS X. So if we’re looking at the audio daemon continuum as spanning embedded audio, desktop audio and pro audio, no solution does all three. PA does the first two, OS X’s CoreAudio implementation does the last two.
  
  With JACK, you can drive down latencies to the same sort of range as CoreAudio on OS X. For the foreseeable future, we’re not aiming to duplicate that ability.
  
  I was only able to find this for an embedded comparison, but it more or less seems to make sense: http://www.musiquetactile.fr/android-is-far-behind-ios/
  
  So as I said, there are places where we can improve latency. Numbers will talk, but I’m hoping we PA can get closer to those iOS numbers than we currently are.
  
  Reply
  - Ian Meredith
    
    March 16, 2012 — 7:50 am
    
    Please keep us posted Arun, I’m very excited at the possibility to achieve low latency audio with an android device. Also excuse me if this a silly question, but if PA was to find its way to android, would that likely be in the form of a custom ROM?
    
    Reply
    - Arun
      
      March 19, 2012 — 8:18 am
      
      Having it in a custom ROM is easier, but it’s really going to be necessary to make it a drop-in replacement to not break existing apps. This is quite doable, so I don’t see why vendors and OEMs might not find this interesting.
      
      Reply
- Gabriel Beddingfield
  
  February 25, 2012 — 10:44 pm
  
  Ross: FWIW, I agree. I think it would be better to refactor AF’s internals than to switch to PA.
  
  Reply
  - Arun
    
    March 19, 2012 — 8:18 am
    
    Please see my reply to Ian above.
    
    Reply
Sarshar

February 25, 2012 — 2:30 pm

Hi dear Arun,

As far as I understood there’s a sound API to reduce that latency which is called PA. I just need to know is there any way to import it into our handheld kernel like Samsung Galaxy SII (i9100) ?! I’m waiting for this job because I need an app like iRig or Ghetto Amp to amp my guitar ;)

Thanks

Reply
- Arun
  
  March 19, 2012 — 8:21 am
  
  Keep watching this blog. I’m not able to dedicate all my time to this effort, but hopefully there will be progress to report in a while.
  
  Reply
Victor

March 19, 2012 — 3:32 am

re: OSX CoreAudio latency, my experience is this: 1. yes, iOS and OSX provide different versions, but the difference is mostly that iOS does not support certain things, like the HAL bit, only AudioUnits (AuHAL). 2. Minimum buffer size on standard HW (ie the iPad/iPhone codec) is 512 frames. Even if you request smaller buffer, you won’t get it. I still have to check with external HW (ie. Alesis iDock). So I guess latency is, with 512 in and 512 out at 44.1K, about 22ms.

Reply
- Victor
  
  March 19, 2012 — 6:00 am
  
  Actually, I had to go and check this. I was wrong, it’s possible to go lower than 512 samples. I tested on a 3G iphone and I can go down to 128 samples, about 3ms. Sorry about that!
  
  I would really love to see Android with this level of latency. The whole deployment thing is so much better than iOS.
  
  Reply
  - Arun
    
    March 19, 2012 — 8:23 am
    
    Getting it down to 3ms is going to be really hard given our architecture (which brings us to the previous comment about the actual OS X and iOS implementations being different). But we can certainly close the gap a bit!
    
    Reply
chambejp

April 18, 2012 — 5:48 am

We over at slatedroid are trying to get audio to work with our old pandigitsl novels. I’m very interested in using this on a ssm2602. Do you have a walkthrough or could you give me some tips and issues you had porting thus? Audioflinger refuses to work for us with ics. Email me so we can talk about this please.

Reply
itsme

May 22, 2012 — 1:19 am

Got your patches already accepted into android? I visited the issue details page, but it doesn’t look so :(

Reply
vonVideo

May 30, 2012 — 6:21 pm

1) PulseAudio was never very good on Linux machines although strongly supported by Collabora. The main reason for it is … latencies :-) 2) There is a solution for low latency audio playback on the Galaxy Nexus GSM. The latency is decreased from 88 ms to 20 ms : http://forum.xda-developers.com/showthread.php?t=1674836

Reply
Chris

May 31, 2012 — 12:24 am

I’m currently a second year CS student, and Android’s audio system is something I’m very interested in. I’d like to explore the idea of PulseAudio in Android, but unfortunately I don’t have a lot of experience with this kind of thing.

After building PulseAudio into Android, did you provide a Java API? Also, how hard do you think it would be to replace AudioFlinger entirely with PulseAudio? Would you need to port the existing Java API to use PulseAudio instead or would it be better to write a wrapper to provide the AudioFlinger API through PulseAudio?

Also, this is a little off topic, but I’m working on a project where I need to redirect the audio out stream to USB. Would I need to work with AudioFlinger directly or can this be done using the Java API?

Reply
android market

October 20, 2012 — 11:33 pm

Got your patches already accepted into android? I visited the issue details page, but it doesn’t look so :(

Reply
Jaxxed

February 5, 2013 — 6:49 pm

Just curious if there has been any progress on this.

PA 3.0 was released last December, and I am curious if anyone has tried the port instructions with v3

Reply
keith sherry

March 6, 2013 — 11:14 pm

I’m trying to sample audio coming in thru “HDMI In” can I use Pulse Audio or do you have a recommended API that I can possible use to do this?

Reply
- Arun
  
  March 7, 2013 — 10:38 am
  
  Are you on Android? If not, you can use the standard PulseAudio API.
  
  Reply
phil

March 12, 2013 — 11:30 pm

One reason I’d love to see PA on Android is for its ability to route all audio to airplay and similar remote media services. Currently I rely on specific apps to send files over to XBMC or airplay devices, and there are many apps that just won’t share in this way (streaming audio such as radio apps often).

I routinely used this on my laptop and I really miss it on my tablet. Latency? Well who needs ms when you are happy with a use case that allows for several seconds delay.

Maybe another use case for selling the concept to OEMs.

Reply
FAS

August 12, 2014 — 2:49 pm

Hi all , Concerning security…Using OpenSL ES ….do you think that the mic for recording can be activated without passing thru Binder (for IPC)?… And what about to activate also a second mic? I need to know if its possible to bypassing BINDEr in OpenSL ES

Reply
- Arun
  
  August 12, 2014 — 4:13 pm
  
  OpenSL ES would go through AudioFlinger as well (last I saw, the implementation was a wrapper on the AudioTrack API), so you shouldn’t have a concern there.
  
  Reply
  - FAS
    
    August 18, 2014 — 1:35 pm
    
    Hi Arun Using AUDIORECORD API you can record in 2 mics if you phone have them… :)But concerning OpenSL I know that this function (stereo dual channel) is not implemented, so I am wondering if the mic can be activated from NATIVE directly without any control because it is called outside JAVA enviroment…(from NATIVE) so manifest wont filter at this level. That´s why I need to know if OpenSL es goes allways thru AudioFlinger and audioflinger goes allways thru BINDER, couse its the only way to assure that the hardware mic cant be use directly without any kind of control. :). So I dont know if openES can talk to HAL directly without BINDER,. I hope this explanation can help you to understand my question :)… Any idea? BR
    
    Reply
    - Arun
      
      August 19, 2014 — 9:17 am
      
      For any application, the correct (only) way to talk to the audio stack is via AudioFlinger via binder. If OpenSLES doesn’t support device selection (but AudioRecord does), the “best” solution is to extend the Android OpenSLES implementation (whose source is available), to actually support it.
      
      The problem, of course, is that this requires you to make the change, get it accepted by Google, and then hopefully ship in the next version on Android.
      
      Does the Java API allow you to do device selection? If yes, that might be a way to work around this problem.
      
      Reply
FAS

August 13, 2014 — 3:15 pm

Hi Arun Using AUDIORECORD API you can record in 2 mics if you phone have them… :)But concerning OpenSL I know that this function (stereo dual channel) is not implemented, so I am wondering if the mic can be activated from NATIVE directly without any control because it is called outside JAVA enviroment…(from NATIVE) so manifest wont filter at this level. That´s why I need to know if OpenSL es goes allways thru AudioFlinger and audioflinger goes allways thru BINDER, couse its the only way to assure that the hardware mic cant be use directly without any kind of control. :). DSo I dont know if openes can talt to HAL directly without BINDER,. I hope this explanation can help you to understand my problem :)… Any idea? BR

Reply
Hardik

March 22, 2017 — 12:42 pm

Hi Arun,

First of all thanks a lot for this great idea. I just want to know that… is the “module-combine-sink” tested on Android ?

If I want to play simultaneous and same audio with HDMI and external Codec with I2s interface ?

Please share if anybody have experience on this or any idea to reach..

Reply
- Arun
  
  April 7, 2017 — 2:25 pm
  
  Hardik, there’s nothing h/w specific about module-combine-sink, so it will work on Android. And if your HDMI and I2S devices are separate cards, what you want to do is possible (even easy!).
  
  Reply
Patraputt

March 29, 2019 — 6:44 pm

Where is the apk?

Reply

PulseAudio vs. AudioFlinger: Fight!

The Contenders

The Arena

Preparations

Round 1: CPU

Round 2: Memory

Round 3: Power

Round 4: Latency

Round 5: Features

Ding! Ding! Ding!

85 Comments

Arun

Arun

random PA user

Ronald McMacdonald

Arun

Arun

marko

Dave

Arun

Eero Tamminen

sam

Vinod Koul

Arun

Leander Faux

Arun

Arun

ching

Arun

ching

Arun

ching

liam

ching

liam

itsme

Konrad

liam

Arun

liam

liam

liam

liam

sam

Arun

Arun

Arun

Sarshar

Arun

Victor

Victor

Arun

chambejp

itsme

vonVideo

Chris

Jaxxed

keith sherry

Arun

FAS

Arun

FAS

Arun

FAS

Hardik

Arun

Patraputt

9 Pingbacks

Leave a Reply Cancel reply