I know it’s been ages, but I am now working on updating the webrtc-audio-processing library. You might remember this as the code that we split off from the webrtc.org code to use in the PulseAudio echo cancellation module.
This is basically just the AudioProcessing module, bundled as a standalone library so that we can use the fantastic AEC, AGC, and noise suppression implementation from that code base. For packaging simplicity, I made a copy of the necessary code, and wrote an autotools-based build system around that.
Now since I last copied the code, the library API has changed a bit — nothing drastic, just a few minor cleanups and removed API. This wouldn’t normally be a big deal since this code isn’t actually published as external API — it’s mostly embedded in the Chromium and Firefox trees, probably other projects too.
Since we are exposing a copy of this code as a standalone library, this means that there are two options — we could (a) just break the API, and all dependent code needs to be updated to be able to use the new version, or (b) write a small wrapper to try to maintain backwards compatibility.
I’m inclined to just break API and release a new version of the library which is not backwards compatible. My rationale for this is that I’d like to keep the code as close to what is upstream as possible, and over time it could become painful to maintain a bunch of backwards-compatibility code.
A nicer solution would be to work with upstream to make it possible to build the AudioProcessing module as a standalone library. While the folks upstream seemed amenable to the idea when this came up a few years ago, nobody has stepped up to actually do the work for this. In the mean time, a number of interesting features have been added to the module, and it would be good to pull this in to use in PulseAudio and any other projects using this code (more about this in a follow-up post).
So if you’re using webrtc-audio-processing, be warned that the next release will probably break API, and you’ll need to update your code. I’ll try to publish a quick update guide when releasing the code, but if you want to look at the current API, take a look at the current audio_processing.h.
p.s.: If you do use webrtc-audio-processing as a dependency, I’d love to hear about it. As far as I know, PulseAudio is the only user of this library at the moment.
This one’s a bit late, for reasons that’ll be clear enough later in this post. I had the happy opportunity to go to GUADEC in Gothenburg this year (after missing the last two, unfortunately). It was a great, well-organised event, and I felt super-charged again, meeting all the people making GNOME better every day.
I presented a status update of what we’ve been up to in the PulseAudio world in the past few years. Amazingly, all the videos are up already, so you can catch up with anything that you might have missed here.
We also had a meeting of PulseAudio developers which and a number of interesting topics of discussion came up (I’ll try to summarise my notes in a separate post).
A bunch of other interesting discussions happened in the hallways, and I’ll write about that if my investigations take me some place interesting.
Now the downside — I ended up missing the BoF part of GUADEC, and all of the GStreamer hackfest in Montpellier after. As it happens, I contracted dengue and I’m still recovering from this. Fortunately it was the lesser (non-haemorrhagic) version without any complications, so now it’s just a matter of resting till I’ve recuperated completely.
Nevertheless, the first part of the trip was great, and I’d like to thank the GNOME Foundation for sponsoring my travel and stay, without which I would have missed out on all the GUADEC fun this year.
I was in Depok, Indonesia last week to speak at GNOME Asia 2015. It was a great experience — the organisers did a fantastic job and as a bonus, the venue was incredibly pretty!
My talk was about the GNOME audio stack, and my original intention was to talk a bit about the APIs, how to use them, and how to choose which to use. After the first day, though, I felt like a more high-level view of the pieces would be more useful to the audience, so I adjusted the focus a bit. My slides are up here.
Nirbheek and I then spent a couple of days going down to Yogyakarta to cycle around, visit some temples, and sip some fine hipster coffee.
All in all, it was a week well spent. I’d like to thank the GNOME Foundation for helping me get to the conference!
One of the first tools that you should get if you’re hacking with GStreamer or want to play with the latest version without doing evil things to your system is probably the gst-uninstalled script. It’s the equivalent of Python’s virtualenv for hacking on GStreamer. :)
The documentation around getting this set up is a bit frugal, though, so here’s my attempt to clarify things. I was going to put this on our wiki, but that’s a bit search-engine unfriendly, so probably easiest to just keep it here. The setup I outline below can probably be automated further, and comments/suggestions are welcome.
First, get build dependencies for GStreamer core and plugins on your distribution. Commands to do this on some popular distributions follow. This will install a lot of packages, but should mean that you won’t have to play find-the-plugin-dependency for your local build.
Some of you might have been following all the brouhaha over Popcorn Time. I won’t get into the arguments that can be made for and against at the moment.
While poking around at what it was that Popcorn Time was doing, I stumbled upon peerflix, a Node.js-based application that takes a .torrent file that points to one big video file, and presents that as an HTTP stream. It has its own BitTorrent implementation where it prioritises early chunks of the file so that it is possible to start watching the video before the entire file has been downloaded. It also seeds the file while the video is being watched locally.
Seeing as I was at the GStreamer Hackfest in Munich when this came up in discussions, it seemed topical to have a GStreamer element to wrap this neat bit of functionality. Thus was peerflixsrc born. This is a simple source element that takes a URI to a torrent file (something like torrent+http://archive.org/some/video.torrent), fires up peerflix in the background, and provides the data from the corresponding HTTP stream. Conveniently enough, this can be launched using playbin or Totem (hinting at the possibilities of what can come next!). Here’s what it looks like…
The code is available now. To use it, build this copy of gst-plugins-bad using your favourite way, make sure you have peerflix installed (sudo npm install -g peerflix), and you’re good to go.
This is not quite mature enough to go into upstream GStreamer. The ugliest part is firing up a Node.js server to make this work, not the least because managing child processes on Linux is not the prettiest code you can write. Maybe someone wants to look at rewriting the torrent bits from peerflix in C? There don’t seem to be any decent C-based libraries for this out there, though.
In the mean time, enjoy this, and comments / patches welcome!
Last weekend, I was at the GStreamer Hackfest in Munich. As usual, it was a blast — we got much done, and it was a pleasure to meet the fine folks who bring you your favourite multimedia framework again. Thanks to the conference for providing funding to make this possible!
My plan was to work on making Totem’s support for passthrough audio work flawlessly (think allowing your A/V receiver to decode AC3/DTS if it allows it, with more complex things coming the future as we support it). We’ve had the pieces in place in GStreamer for a while now, and not having that just work with Totem has been a bit of a bummer for me.
The immediate blocker so far has been that Totem needs to add a filter (scaletempo) before the audio sink, which forces negotiation to always pick a software decoder. We solved this by adding the ability for applications to specify audio/video filters for playbin to plug in if it can. There’s a now-closed bug about it, for the curious. Hopefully, I’ll get the rest of the work to make Totem use this done soon, so things just work.
Now the reason that didn’t happen at the hackfest is that I got a bit … distracted … at the hackfest by another problem. More details in an upcoming post!
And we’re back … PulseAudio 4.0 is out! There’s both a short and super-detailed changelog in the release notes. For the lazy, this release brings a bunch of Bluetooth stability updates, better low latency handling, performance improvements, and a whole lot more. :)
One interesting thing is that for this release, we kept a parallel next branch open while master was frozen for stabilising and releasing. As a result, we’re already well on our way to 5.0 with 52 commits since 4.0 already merged into master.
And finally, I’m excited to announce PulseAudio is going to be carrying out two great projects this summer, as part of the Google Summer of Code! We are going to have Alexander Couzens (lynxis) working on a rewrite of module-tunnel using libpulse, mentored by Tanu Kaskinen. In addition to this, Damir Jelić (poljar) working on improvements to resampling, mentored by Peter Meerwald.
That’s just some of the things to look forward to in coming months. I’ve got a few more things I’d like to write about, but I’ll save that for another post.
That’s right — PulseAudio will be participating in the Google Summer of Code again this year! We had a great set of students and projects last year, and you’ve already seen some their work in the last release.
There are some more details on how to get involved on the mailing list. We’re looking forward to having another set of smart and enthusiastic new contributors this year!
p.s.: Mentors and students from organisations (GStreamer and BlueZ, for example), do feel free to get in touch with us if you have ideas for projects related to PulseAudio that overlap with those other projects.
For those of you who missed my previous updates, we recently organised a PulseAudio miniconference in Copenhagen, Denmark last week. The organisation of all this was spearheaded by ALSA and PulseAudio hacker, David Henningsson. The good folks organising the Ubuntu Developer Summit / Linaro Connect were kind enough to allow us to colocate this event. A big thanks to both of them for making this possible!
The conference was attended by the four current active PulseAudio developers: Colin Guthrie, Tanu Kaskinen, David Henningsson, and myself. We were joined by long-time contributors Janos Kovacs and Jaska Uimonen from Intel, Luke Yelavich, Conor Curran and Michał Sawicz.
We started the conference at around 9:30 am on November 2nd, and actually managed to keep to the final schedule(!), so I’m going to break this report down into sub-topics for each item which will hopefully make for easier reading than an essay. I’ve also put up some photos from the conference on the Google+ event.
Mission and Vision
We started off with a broad topic — what each of our personal visions/goals for the project are. Interestingly, two main themes emerged: having the most seamless desktop user experience possible, and making sure we are well-suited to the embedded world.
Most of us expressed interest in making sure that users of various desktops had a smooth, hassle-free audio experience. In the ideal case, they would never need to find out what PulseAudio is!
Orthogonally, a number of us are also very interested in making PulseAudio a strong contender in the embedded space (mobile phones, tablets, set top boxes, cars, and so forth). While we already find PulseAudio being used in some of these, there are areas where we can do better (more in later topics).
There was some reservation expressed about other, less-used features such as network playback being ignored because of this focus. The conclusion after some discussion was that this would not be the case, as a number of embedded use-cases do make use of these and other “fringe” features.
Increasing patch bandwidth
Contributors to PulseAudio will be aware that our patch queue has been growing for the last few months due to lack of developer time. We discussed several ways to deal with this problem, the most promising of which was a periodic triage meeting.
We will be setting up a rotating schedule where each of us will organise a meeting every 2 weeks (the period might change as we implement things) where we can go over outstanding patches and hopefully clear backlog. Colin has agreed to set up the first of these.
Next on the agenda was a presentation by Janos Kovacs about the work they’ve been doing at Intel with enhancing the PulseAudio’s routing infrastructure. These are being built from the perspective of IVI systems (i.e., cars) which typically have fairly complex use cases involving multiple concurrent devices and users. The slides for the talk will be put up here shortly (edit: slides are now available).
The talk was mingled with a Q&A type discussion with Janos and Jaska. The first item of discussion was consolidating Colin’s priority-based routing ideas into the proposed infrastructure. The broad thinking was that the ideas were broadly compatible and should be implementable in the new model.
There was also some discussion on merging the module-combine-sink functionality into PulseAudio’s core, in order to make 1:N routing easier. Some alternatives using te module-filter-* were proposed. Further discussion will likely be required before this is resolved.
The next steps for this work are for Jaska and Janos to break up the code into smaller logical bits so that we can start to review the concepts and code in detail and work towards eventually merging as much as makes sense upstream.
This session was taken up against the background of improving latency for games on the desktop (although it does have other applications). The indicated required latency for games was given as 16 ms (corresponding to a frame rate of 60 fps). A number of ideas to deal with the problem were brought up.
Firstly, it was suggested that the maxlength buffer attribute when setting up streams could be used to signal a hard limit on stream latency — the client signals that it will prefer an underrun, over a latency above maxlength.
Another long-standing item was to investigate the cause of underruns as we lower latency on the stream — David has already begun taking this up on the LKML.
Finally, another long-standing issue is the buffer attribute adjustment done during stream setup. This is not very well-suited to low-latency applications. David and I will be looking at this in coming days.
Merging per-user and system modes
Tanu led the topic of finding a way to deal with use-cases such as mpd or multi-user systems, where access to the PulseAudio daemon of the active user by another user might be desired. Multiple suggestions were put forward, though a definite conclusion was not reached, as further thought is required.
Tanu’s suggestion was a split between a per-user daemon to manage tasks such as per-user configuration, and a system-wide daemon to manage the actual audio resources. The rationale being that the hardware itself is a common resource and could be handled by a non-user-specific daemon instance. This approach has the advantage of having a single entity in charge of the hardware, which keeps a part of the implementation simpler. The disadvantage is that we will either sacrifice security (arbitrary users can “eavesdrop” using the machine’s mic), or security infrastructure will need to be added to decide what users are allowed what access.
I suggested that since these are broadly fringe use-cases, we should document how users can configure the system by hand for these purposes, the crux of the argument being that our architecture should be dictated by the main use-cases, and not the ancillary ones. The disadvantage of this approach is, of course, that configuration is harder for the minority that wishes multi-user access to the hardware.
Colin suggested a mechanism for users to be able to request access from an “active” PulseAudio daemon, which could trigger approval by the corresponding “active” user. The communication mechanism could be the D-Bus system bus between user daemons, and Ștefan Săftescu’s Google Summer of Code work to allow desktop notifications to be triggered from PulseAudio could be used to get to request authorisation.
David suggested that we could use the per-user/system-wide split, modified somewhat to introduce the concept of a “system-wide” card. This would be a device that is configured as being available to the whole system, and thus explicitly marked as not having any privacy guarantees.
In both the above cases, discussion continued about deciding how the access control would be handled, and this remains open.
We will be continuing to look at this problem until consensus emerges.
Improving (laptop) surround sound
The next topic dealt with being able to deal with laptops with a built-in 2.1 channel set up. The background of this is that there are a number of laptops with stereo speakers and a subwoofer. These are usually used as stereo devices with the subwoofer implicitly being fed data by the audio controller in some hardware-dependent way.
The possibility of exposing this hardware more accurately was discussed. Some investigation is required to see how things are currently exposed for various hardware (my MacBook Pro exposes the subwoofer as a surround control, for example). We need to deal with correctly exposing the hardware at the ALSA layer, and then using that correctly in PulseAudio profiles.
This led to a discussion of how we could handle profiles for these. Ideally, we would have a stereo profile with the hardware dealing with upmixing, and a 2.1 profile that would be automatically triggered when a stream with an LFE channel was presented. This is a general problem while dealing with surround output on HDMI as well, and needs further thought as it complicates routing.
I gave a rousing speech about writing more tests using some of the new improvements to our testing framework. Much cheering and acknowledgement ensued.
Ed.: some literary liberties might have been taken in this section
Unified cross-distribution ALSA configuration
I missed a large part of this unfortunately, but the crux if the discussion was around unifying cross-distribution sound configuration for those who wish to disable PulseAudio.
The next topic we took up was base volumes, and whether they are useful to most end users. For those unfamiliar with the concept, we sometimes see sinks/sources where which support volume controls going to > 0dB (which is the no=attenuation point). We provide the maximum allowed gain in ALSA as the maximum volume, and suggest that UIs show a marker for the base volume.
It was felt that this concept was irrelevant, and probably confusing to most end users, and that we suggest that UIs do not show this information any more.
Relatedly, it was decided that having a per-port maximum volume configuration would be useful, so as to allow users to deal with hardware where the output might get too loud.
Devices with dynamic capabilities (HDMI)
Our next topic of discussion was finding a way to deal with devices such as those HDMI ports where the capabilities of the device could change at run time (for example, when you plug out a monitor and plug in a home theater receiver).
A few ideas to deal with this were discussed, and the best one seemed to be David’s proposal to always have a separate card for each HDMI device. The addition of dynamic profiles could then be exploited to only make profiles available when an actual device is plugged in (and conversely removed when the device is plugged out).
Splitting of configuration
It was suggested that we could split our current configuration files into three categories: core, policy and hardware adaptation. This was met with approval all-around, and the pre-existing ability to read configuration from subdirectories could be reused.
Another feature that was desired was the ability to ship multiple configurations for different hardware adaptations with a single package and have the correct one selected based on the hardware being run on. We did not know of a standard, architecture-independent way to determine hardware adaptation, so it was felt that the first step toward solving this problem would be to find or create such a mechanism. This could either then be used to set up configuration correctly in early boot, or by PulseAudio for do runtime configuration selection.
Relatedly, moving all distributed configuration to /usr/share/..., with overrides in /etc/pulse/... and $HOME were suggested.
Better drain/underrun reporting
David volunteered to implement a per-sink-input timer for accurately determining when drain was completed, rather than waiting for the period of the entire buffer as we currently do. Unsurprisingly, no objections were raised to this solution to the long-standing issue.
In a similar vein, redefining the underflow event to mean a real device underflow (rather than the client-side buffer running empty) was suggested. After some discussion, we agreed that a separate event for device underruns would likely be better.
We called it a day at this point and dispersed beer-wards.
David very kindly invited us to spend a day after the conference hacking at his house in Lund, Sweden, just a short hop away from Copenhagen. We spent a short while in the morning talking about one last item on the agenda — helping to build a more seamless user experience. The idea was to figure out some tools to help users with problems quickly converge on what problem they might be facing (or help developers do the same). We looked at the Ubuntu apport audio debugging tool that David has written, and will try to adopt it for more general use across distributions.
The rest of the day was spent in more discussions on topics from the previous day, poring over code for some specific problems, and rolling out the first release candidate for the upcoming 3.0 release.
I am very happy that this conference happened, and am looking forward to being able to do it again next year. As you can see from the length of this post, there are lot of things happening in this part of the stack, and lots more yet to come. It was excellent meeting all the fellow PulseAudio hackers, and my thanks to all of them for making it.
Finally, I wouldn’t be sitting here writing this report without support from Collabora, who sponsored my travel to the conference, so it’s fitting that I end this with a shout-out to them. :)