For those of you who missed my previous updates, we recently organised a PulseAudio miniconference in Copenhagen, Denmark last week. The organisation of all this was spearheaded by ALSA and PulseAudio hacker, David Henningsson. The good folks organising the Ubuntu Developer Summit / Linaro Connect were kind enough to allow us to colocate this event. A big thanks to both of them for making this possible!
The conference was attended by the four current active PulseAudio developers: Colin Guthrie, Tanu Kaskinen, David Henningsson, and myself. We were joined by long-time contributors Janos Kovacs and Jaska Uimonen from Intel, Luke Yelavich, Conor Curran and Michał Sawicz.
We started the conference at around 9:30 am on November 2nd, and actually managed to keep to the final schedule(!), so I’m going to break this report down into sub-topics for each item which will hopefully make for easier reading than an essay. I’ve also put up some photos from the conference on the Google+ event.
Mission and Vision
We started off with a broad topic — what each of our personal visions/goals for the project are. Interestingly, two main themes emerged: having the most seamless desktop user experience possible, and making sure we are well-suited to the embedded world.
Most of us expressed interest in making sure that users of various desktops had a smooth, hassle-free audio experience. In the ideal case, they would never need to find out what PulseAudio is!
Orthogonally, a number of us are also very interested in making PulseAudio a strong contender in the embedded space (mobile phones, tablets, set top boxes, cars, and so forth). While we already find PulseAudio being used in some of these, there are areas where we can do better (more in later topics).
There was some reservation expressed about other, less-used features such as network playback being ignored because of this focus. The conclusion after some discussion was that this would not be the case, as a number of embedded use-cases do make use of these and other “fringe” features.
Increasing patch bandwidth
Contributors to PulseAudio will be aware that our patch queue has been growing for the last few months due to lack of developer time. We discussed several ways to deal with this problem, the most promising of which was a periodic triage meeting.
We will be setting up a rotating schedule where each of us will organise a meeting every 2 weeks (the period might change as we implement things) where we can go over outstanding patches and hopefully clear backlog. Colin has agreed to set up the first of these.
Next on the agenda was a presentation by Janos Kovacs about the work they’ve been doing at Intel with enhancing the PulseAudio’s routing infrastructure. These are being built from the perspective of IVI systems (i.e., cars) which typically have fairly complex use cases involving multiple concurrent devices and users. The slides for the talk will be put up here shortly (edit: slides are now available).
The talk was mingled with a Q&A type discussion with Janos and Jaska. The first item of discussion was consolidating Colin’s priority-based routing ideas into the proposed infrastructure. The broad thinking was that the ideas were broadly compatible and should be implementable in the new model.
There was also some discussion on merging the module-combine-sink functionality into PulseAudio’s core, in order to make 1:N routing easier. Some alternatives using te module-filter-* were proposed. Further discussion will likely be required before this is resolved.
The next steps for this work are for Jaska and Janos to break up the code into smaller logical bits so that we can start to review the concepts and code in detail and work towards eventually merging as much as makes sense upstream.
This session was taken up against the background of improving latency for games on the desktop (although it does have other applications). The indicated required latency for games was given as 16 ms (corresponding to a frame rate of 60 fps). A number of ideas to deal with the problem were brought up.
Firstly, it was suggested that the maxlength buffer attribute when setting up streams could be used to signal a hard limit on stream latency — the client signals that it will prefer an underrun, over a latency above maxlength.
Another long-standing item was to investigate the cause of underruns as we lower latency on the stream — David has already begun taking this up on the LKML.
Finally, another long-standing issue is the buffer attribute adjustment done during stream setup. This is not very well-suited to low-latency applications. David and I will be looking at this in coming days.
Merging per-user and system modes
Tanu led the topic of finding a way to deal with use-cases such as mpd or multi-user systems, where access to the PulseAudio daemon of the active user by another user might be desired. Multiple suggestions were put forward, though a definite conclusion was not reached, as further thought is required.
Tanu’s suggestion was a split between a per-user daemon to manage tasks such as per-user configuration, and a system-wide daemon to manage the actual audio resources. The rationale being that the hardware itself is a common resource and could be handled by a non-user-specific daemon instance. This approach has the advantage of having a single entity in charge of the hardware, which keeps a part of the implementation simpler. The disadvantage is that we will either sacrifice security (arbitrary users can “eavesdrop” using the machine’s mic), or security infrastructure will need to be added to decide what users are allowed what access.
I suggested that since these are broadly fringe use-cases, we should document how users can configure the system by hand for these purposes, the crux of the argument being that our architecture should be dictated by the main use-cases, and not the ancillary ones. The disadvantage of this approach is, of course, that configuration is harder for the minority that wishes multi-user access to the hardware.
Colin suggested a mechanism for users to be able to request access from an “active” PulseAudio daemon, which could trigger approval by the corresponding “active” user. The communication mechanism could be the D-Bus system bus between user daemons, and Ștefan Săftescu’s Google Summer of Code work to allow desktop notifications to be triggered from PulseAudio could be used to get to request authorisation.
David suggested that we could use the per-user/system-wide split, modified somewhat to introduce the concept of a “system-wide” card. This would be a device that is configured as being available to the whole system, and thus explicitly marked as not having any privacy guarantees.
In both the above cases, discussion continued about deciding how the access control would be handled, and this remains open.
We will be continuing to look at this problem until consensus emerges.
Improving (laptop) surround sound
The next topic dealt with being able to deal with laptops with a built-in 2.1 channel set up. The background of this is that there are a number of laptops with stereo speakers and a subwoofer. These are usually used as stereo devices with the subwoofer implicitly being fed data by the audio controller in some hardware-dependent way.
The possibility of exposing this hardware more accurately was discussed. Some investigation is required to see how things are currently exposed for various hardware (my MacBook Pro exposes the subwoofer as a surround control, for example). We need to deal with correctly exposing the hardware at the ALSA layer, and then using that correctly in PulseAudio profiles.
This led to a discussion of how we could handle profiles for these. Ideally, we would have a stereo profile with the hardware dealing with upmixing, and a 2.1 profile that would be automatically triggered when a stream with an LFE channel was presented. This is a general problem while dealing with surround output on HDMI as well, and needs further thought as it complicates routing.
I gave a rousing speech about writing more tests using some of the new improvements to our testing framework. Much cheering and acknowledgement ensued.
Ed.: some literary liberties might have been taken in this section
Unified cross-distribution ALSA configuration
I missed a large part of this unfortunately, but the crux if the discussion was around unifying cross-distribution sound configuration for those who wish to disable PulseAudio.
The next topic we took up was base volumes, and whether they are useful to most end users. For those unfamiliar with the concept, we sometimes see sinks/sources where which support volume controls going to > 0dB (which is the no=attenuation point). We provide the maximum allowed gain in ALSA as the maximum volume, and suggest that UIs show a marker for the base volume.
It was felt that this concept was irrelevant, and probably confusing to most end users, and that we suggest that UIs do not show this information any more.
Relatedly, it was decided that having a per-port maximum volume configuration would be useful, so as to allow users to deal with hardware where the output might get too loud.
Devices with dynamic capabilities (HDMI)
Our next topic of discussion was finding a way to deal with devices such as those HDMI ports where the capabilities of the device could change at run time (for example, when you plug out a monitor and plug in a home theater receiver).
A few ideas to deal with this were discussed, and the best one seemed to be David’s proposal to always have a separate card for each HDMI device. The addition of dynamic profiles could then be exploited to only make profiles available when an actual device is plugged in (and conversely removed when the device is plugged out).
Splitting of configuration
It was suggested that we could split our current configuration files into three categories: core, policy and hardware adaptation. This was met with approval all-around, and the pre-existing ability to read configuration from subdirectories could be reused.
Another feature that was desired was the ability to ship multiple configurations for different hardware adaptations with a single package and have the correct one selected based on the hardware being run on. We did not know of a standard, architecture-independent way to determine hardware adaptation, so it was felt that the first step toward solving this problem would be to find or create such a mechanism. This could either then be used to set up configuration correctly in early boot, or by PulseAudio for do runtime configuration selection.
Relatedly, moving all distributed configuration to /usr/share/..., with overrides in /etc/pulse/... and $HOME were suggested.
Better drain/underrun reporting
David volunteered to implement a per-sink-input timer for accurately determining when drain was completed, rather than waiting for the period of the entire buffer as we currently do. Unsurprisingly, no objections were raised to this solution to the long-standing issue.
In a similar vein, redefining the underflow event to mean a real device underflow (rather than the client-side buffer running empty) was suggested. After some discussion, we agreed that a separate event for device underruns would likely be better.
We called it a day at this point and dispersed beer-wards.
David very kindly invited us to spend a day after the conference hacking at his house in Lund, Sweden, just a short hop away from Copenhagen. We spent a short while in the morning talking about one last item on the agenda — helping to build a more seamless user experience. The idea was to figure out some tools to help users with problems quickly converge on what problem they might be facing (or help developers do the same). We looked at the Ubuntu apport audio debugging tool that David has written, and will try to adopt it for more general use across distributions.
The rest of the day was spent in more discussions on topics from the previous day, poring over code for some specific problems, and rolling out the first release candidate for the upcoming 3.0 release.
I am very happy that this conference happened, and am looking forward to being able to do it again next year. As you can see from the length of this post, there are lot of things happening in this part of the stack, and lots more yet to come. It was excellent meeting all the fellow PulseAudio hackers, and my thanks to all of them for making it.
Finally, I wouldn’t be sitting here writing this report without support from Collabora, who sponsored my travel to the conference, so it’s fitting that I end this with a shout-out to them. :)
November 9, 2012 — 12:42 am
Great post! Thanks for this report. I’m actually very interested on the part where you talk about latency and underruns. As I’m the current maintainer of the SCHED_DEADLINE patchset, and I’d like to give it a useful use-case, I started to see if the use of it could be beneficial for PulseAudio. PulseAudio makes use of RT policies to be able to correctly manage buffers and other stuff, I probably try to figure out how to do same things with deadline scheduling. I searched the LKML, but I didn’t find anything regarding the problems you talked about. Can you please post a link to the discussion?
Thanks a lot, – Juri
November 9, 2012 — 9:30 am
Hi Juri, the SCHED_DEADLINE work looks really interesting in the PulseAudio context! David was looking at various causes of latency at the system level – his post can be found here: https://lkml.org/lkml/2012/11/5/74
I’d be happy to give you some examples of how to try out low latency playback and see its effects. One easy way: run pulseaudio with verbose output (-vvvv) and then run paplay --latency-msec=12 --process-time-msec=3 /path/to/some/wav/file. You’ll see output in the server logs from the alsa-sink bits about the minimum latency it is able to deliver amongst other things.
November 15, 2012 — 1:26 am
Thank you for the examples. Can you think about a particular configuration or system that would stress PulseAudio to the extent that the user starts to experiences glitches? I tried to play a simple wav file through paplay as you told me, I also added “background” noise (compile a kernel) and some real-time activities, but I didn’t see any problem. It would be great to be able to isolate a case in which the temporal isolation provided by SCHED_DEADLINE would be beneficial. Thanks a lot!
November 11, 2012 — 2:38 am
Hi Arun! Thanks for making the conference happen, and for publishing these exhaustive notes!
I had a question about the “base volume” part of this report, but first let me say that if I should ask on a different mailing list or location, I’m happy to do that.
I’m a user who semi-regularly uses pavucontrol’s UI to boost volume for audio sinks past 100%. The reason (crazy as it may sound) is that I have a passive speaker in my shower that I connect to my laptop to play music, and the passive speaker doesn’t quite sound loud enough when my laptop volume is at 100%. So I usually blast it to 110% or 120% in pavucontrol, and that does the trick. (Naturally, once I reach a certain point, ~130% in my case, the sound gets “clipped.”)
From what I read of your base volume discussion, you intend to make it so that user-oriented UIs no longer show options past 100%. Let me know if that’s a mis-read.
I have noticed that “regular user-oriented” UIs like the GNOME volume knob and Rhythmbox’s volume knob already don’t seem to let me go past 100%.
I’m hoping that your “base volume” changes still let me do the above in pavucontrol. In general, I find pavucontrol extremely useful but hardly mentioned nowadays (I’ve been using pulseaudio since before the name change mid last-decade), so I figure it’s considered an advanced tool for people like me and will continue to work in that fashion.
So… from what I can tell, the “base volume” discussion means that pavucontrol will still let me do what I’m doing now, and that regular UIs like Rhythmbox’s will continue to only show a range of 0 to 100% (where 100% is what you refer to as 0dB). You said you’d remove indications of 0dB AKA 100% in user-oriented UIs… does that mean pavucontrol will stop showing me the helpful 100% marker? Or just that non-pavucontrol UIs will only show a range of 0 to 100% (which as far as I can tell is what they already do)?
Thanks! Sorry for all that repetition; I’m hoping that by explaining what I think your note means, you’ll spot any confusion or mis-reading on my part quickly. Again, I appreciate all the work that goes into pulseaudio and also post-event notes; I’ve had a lot of experience with post-event writeups recently so I definitely know they take quite some thinking and writing!
November 13, 2012 — 12:03 pm
Hi Asheesh, sorry about the long delay in replying! The short answer to your question is: no, your current functionality will not change at all.
The long answer is: base volumes are used to signal a case when 0 dB is not 100%. Some devices (usually capture, but sometimes also playback) have the ability to provide amplification, and this is exposed via ALSA. In such cases, we define the “base volume” as the 0 dB point, and 100% as the maximum amplification that the ALSA mixers expose.
Our current recommendation for volume UIs is to show this base volume as a tick somewhere between o and 100%. This is really not terribly useful for most users, so we’re inclined to stop showing that base volume tick. The rest of the functionality remains the same.
Hope this helps! :)
November 16, 2012 — 1:15 am
I use PulseAudio under Ubuntu 12.10 32 bits, but, the essential tools are still not out. For example, I can read all current hardware settings by command “pactl list”, but I cannot configure those same settings, like latency or sample, and if I put a capture card playing by command “pactl load-module module-loopback” the delay is almost one minute from the sound source (USB Easycap). It is real necessary a linux tool that can control advanced PulseAudio behavior, like the ability to set that, internal microphone outputs on left speaker, and capture card input outputs on right speaker, and vice-versa, and also combining different sources to a common output (file or speaker)…
November 16, 2012 — 2:09 am
I would like to add that the opposite is also necessary, like the ability to set that, internal microphone outputs on right speaker and rear speaker and center speaker, but not in left or bass, in sum, the ability to configure several sets that, allow multiple input to multiple output and vice-versa, or single input to multiple output, or multiple input to single output, or single input to single output… All this is essencial both for home and business users, cause, in an office reception the computer can control where the recpcionist mic plays, like having four speakers each in different room, etc.; have a nice rest of year and jolly holidays for christmas and new year!