Improvements to PulseAudio’s Echo Cancellation

As we approach the PulseAudio 9.0 release, I thought it would be a good time to talk about one of the things I had a chance to work on, that landed in this cycle.

Old-time readers will remember the work I had done in the past on echo cancellation. If you’re unfamiliar with the concept, imagine a situation where you’re making a call from your phone or laptop. You don’t have a headset, so you use your device’s speaker and microphone. Now when the person on the other end speaks, their voice is played out of your speaker and captured by your mic. This means that they can now also hear what they’re saying, with some lag — this is called echo. If this has happened to you, you know how annoying and disruptive it can be.

Using Acoustic Echo Cancellation (AEC), PulseAudio is able to detect this in the captured input, and remove the audio we recently played back. While doing this, we also run some other algorithms to enhance the captured input, such as noise suppression (great at damping out background and fan noise) and acoustic gain control, or AGC, which adjusts the mic volume so you are clearly audible). In addition to voice call use cases, this is also handy to have in other applications such as speech recognition (where you want the device to detect what a user is saying, while possibly playing out other sounds).

We don’t implement these algorithms ourselves in PulseAudio. The echo cancellation module — cunningly named module-echo-cancel — provides the infrastructure to plug in different echo canceller implementations. One of these that we support (and recommend), is based on Google’s [WebRTC.org] implementation which includes an extremely capable set of voice processing algorithms.

This is a large code-base, intended to support a full real-time communication stack, and we didn’t want to pick up all that code to include in PulseAudio. So what I did was to make a copy of the AudioProcessing module, wrap it in an easy-to-package library, and then used that from PulseAudio. Quite some time passed by, and I didn’t get a chance to update that code, until last October.

What’s New

The update brought us a number of things since the last one (5 years ago!):

  • The AGC module has essentially been rewritten. In practice, we see that it is slower to change the volume.

  • Voice Activity Detection (VAD) has also been split off into its own module and undergone significant changes.

  • Beamforming has been added, to allow you to use a set of microphones to be able to “point” your microphone array in a specific direction (more on this in a later post).

  • There is now an intelligibility enhancer for applying processing on the stream coming in from the far end (so you can hear the other side better). This feature has not been hooked up in PulseAudio yet.

  • There is a transient suppressor for when you’re on a laptop, and your microphone is picking up keystrokes. This can be important since the sound of the keystroke introduces sharp spikes or “transients” in the audio stream, which can throw off the echo canceller that works best with the frequency range of the human voice. This one seems to be work-in-progress, and not actually used yet.

In addition to this, I’ve also extended support in module-echo-cancel for performing cancellation on multiple channels. So we are now able to deal with hardware that has any number of playback and capture channels (and they don’t even need to be equal), and we no longer have the artificial restriction of having to downmix things to mono.

These changes are in the newly released webrtc-audio-processing v0.2. Unfortunately, we do break API with regards to the previous version. I wrote about this a while back, and hopefully the impact on other users of this library will be minimal.

All this work was made possible thanks to Aldebaran Robotics. A special shout-out to Julien Massot and his excellent team!

These features are already in our master branch, and will be part of the 9.0 release. If you’re using these features, let me know how things work for you, and watch out for a follow up post about beamforming.

If you or your company are looking for help with either PulseAudio or GStreamer, do take a look at the consulting services I currently provide.

Audio Devices and Configuration

This one’s going to be a bit of a long post. You might want to grab a cup of coffee before you jump in!

Over the last few years, I’ve spent some time getting PulseAudio up and running on a few Android-based phones. There was the initial Galaxy Nexus port, a proof-of-concept port of Firefox OS (git) to use PulseAudio instead of AudioFlinger on a Nexus 4, and most recently, a port of Firefox OS to use PulseAudio on the first gen Moto G and last year’s Sony Xperia Z3 Compact (git).

The process so far has been largely manual and painstaking, and I’ve been trying to make that easier. But before I talk about the how of that, let’s see how all this works in the first place.

Continue reading

A Quick Update

Happy 2016 everyone!

While I did mention a while back (almost two years ago, wow) that I was taking a break, I realised recently that I hadn’t posted an update from when I started again.

For the last year and a half, I’ve been providing freelance consulting around PulseAudio, GStreamer, and various other directly and tangentially related projects. There’s a brief list of the kind of work I’ve been involved in.

If you’re looking for help with PulseAudio, GStreamer, multimedia middleware or anything else you might’ve come across on this blog, do get in touch!

PulseAudio 7.1 is out

We just rolled out a minor bugfix release. Quick changelog:

  • Fix a crasher when using srbchannel
  • Fix a build system typo that caused symlinks to turn up in /
  • Make Xonar cards work better
  • Other minor bug fixes and improvements

More details on the mailing list.

Thanks to everyone who contributed with bug reports and testing. What isn’t generally visible is that a lot of this happens behind the scenes downstream on distribution bug trackers, IRC, and so forth.

PSA: Breaking webrtc-audio-processing API

I know it’s been ages, but I am now working on updating the webrtc-audio-processing library. You might remember this as the code that we split off from the webrtc.org code to use in the PulseAudio echo cancellation module.

This is basically just the AudioProcessing module, bundled as a standalone library so that we can use the fantastic AEC, AGC, and noise suppression implementation from that code base. For packaging simplicity, I made a copy of the necessary code, and wrote an autotools-based build system around that.

Now since I last copied the code, the library API has changed a bit — nothing drastic, just a few minor cleanups and removed API. This wouldn’t normally be a big deal since this code isn’t actually published as external API — it’s mostly embedded in the Chromium and Firefox trees, probably other projects too.

Since we are exposing a copy of this code as a standalone library, this means that there are two options — we could (a) just break the API, and all dependent code needs to be updated to be able to use the new version, or (b) write a small wrapper to try to maintain backwards compatibility.

I’m inclined to just break API and release a new version of the library which is not backwards compatible. My rationale for this is that I’d like to keep the code as close to what is upstream as possible, and over time it could become painful to maintain a bunch of backwards-compatibility code.

A nicer solution would be to work with upstream to make it possible to build the AudioProcessing module as a standalone library. While the folks upstream seemed amenable to the idea when this came up a few years ago, nobody has stepped up to actually do the work for this. In the mean time, a number of interesting features have been added to the module, and it would be good to pull this in to use in PulseAudio and any other projects using this code (more about this in a follow-up post).

So if you’re using webrtc-audio-processing, be warned that the next release will probably break API, and you’ll need to update your code. I’ll try to publish a quick update guide when releasing the code, but if you want to look at the current API, take a look at the current audio_processing.h.

p.s.: If you do use webrtc-audio-processing as a dependency, I’d love to hear about it. As far as I know, PulseAudio is the only user of this library at the moment.

GUADEC 2015

This one’s a bit late, for reasons that’ll be clear enough later in this post. I had the happy opportunity to go to GUADEC in Gothenburg this year (after missing the last two, unfortunately). It was a great, well-organised event, and I felt super-charged again, meeting all the people making GNOME better every day.

GUADEC picnic @ Gothenberg
GUADEC picnic @ Gothenberg

I presented a status update of what we’ve been up to in the PulseAudio world in the past few years. Amazingly, all the videos are up already, so you can catch up with anything that you might have missed here.

We also had a meeting of PulseAudio developers which and a number of interesting topics of discussion came up (I’ll try to summarise my notes in a separate post).

A bunch of other interesting discussions happened in the hallways, and I’ll write about that if my investigations take me some place interesting.

Now the downside — I ended up missing the BoF part of GUADEC, and all of the GStreamer hackfest in Montpellier after. As it happens, I contracted dengue and I’m still recovering from this. Fortunately it was the lesser (non-haemorrhagic) version without any complications, so now it’s just a matter of resting till I’ve recuperated completely.

Nevertheless, the first part of the trip was great, and I’d like to thank the GNOME Foundation for sponsoring my travel and stay, without which I would have missed out on all the GUADEC fun this year.

Sponsored by GNOME!
Sponsored by GNOME!

GNOME Asia 2015

I was in Depok, Indonesia last week to speak at GNOME Asia 2015. It was a great experience — the organisers did a fantastic job and as a bonus, the venue was incredibly pretty!

View from our room
View from our room

My talk was about the GNOME audio stack, and my original intention was to talk a bit about the APIs, how to use them, and how to choose which to use. After the first day, though, I felt like a more high-level view of the pieces would be more useful to the audience, so I adjusted the focus a bit. My slides are up here.

Nirbheek and I then spent a couple of days going down to Yogyakarta to cycle around, visit some temples, and sip some fine hipster coffee.

All in all, it was a week well spent. I’d like to thank the GNOME Foundation for helping me get to the conference!

Sponsored by GNOME!
Sponsored by GNOME!

Reviewing moved files with git

This might be a well-known trick already, but just in case it’s not…

Reviewing a patch can be a bit painful when a file that has been changed and moved or renamed at one go (and there can be perfectly valid reasons for doing this). A nice thing about git is that you can reference files in an arbitrary tree while using git diff, so reviewing such changes can become easier if you do something like this:

$ git am 0001-the-thing-I-need-to-review.patch
$ git diff HEAD^:old/path/to/file.c new/path/to/file.c

This just references file.c in its old path, which is available in the commit before HEAD, and compares it to the file at the new path in the patch you just merged.

Of course, you can also use this to diff a file at some arbitrary point in the past, or in some arbitrary branch, with the same file at the current HEAD or any other point.

Hopefully this is helpful to someone out there!

Update: As Alex Elsayed points out in the comments, git diff -M/-C can be used to similar effect. The above example, for example, could be written as:

$ git am 0001-the-thing-I-need-to-review.patch
$ git show -C

Notes from the PulseAudio Mini Summit 2014

The third week of October was quite action-packed, with a whole bunch of conferences happening in Düsseldorf. The Linux audio developer community as well as the PulseAudio developers each had a whole day of discussions related to a wide range of topics. I’ll be summarising the events of the PulseAudio mini summit day here. The discussion was split into two parts, the first half of the day with just the current core developers and the latter half with members of the community participating as well.

I’d like to thank the Linux Foundation for sparing us a room to carry out these discussions — it’s fantastic that we are able to colocate such meetings with a bunch of other conferences, making it much easier than it would otherwise be for all of us to converge to a single place, hash out ideas, and generally have a good time in real life as well!

Incontrovertible proof that all our users are happy
Happy faces — incontrovertible proof that everyone loves PulseAudio!

With a whole day of discussions, this is clearly going to be a long post, so you might want to grab a coffee now. :)

Continue reading

Quick-start guide to gst-uninstalled for GStreamer 1.x

One of the first tools that you should get if you’re hacking with GStreamer or want to play with the latest version without doing evil things to your system is probably the gst-uninstalled script. It’s the equivalent of Python’s virtualenv for hacking on GStreamer. :)

The documentation around getting this set up is a bit frugal, though, so here’s my attempt to clarify things. I was going to put this on our wiki, but that’s a bit search-engine unfriendly, so probably easiest to just keep it here. The setup I outline below can probably be automated further, and comments/suggestions are welcome.

  • First, get build dependencies for GStreamer core and plugins on your distribution. Commands to do this on some popular distributions follow. This will install a lot of packages, but should mean that you won’t have to play find-the-plugin-dependency for your local build.

    • Fedora: $ sudo yum-builddep gstreamer1-*
    • Debian/Ubuntu: $ sudo apt-get build-dep gstreamer1.0-plugins-{base,good,bad,ugly}
    • Gentoo: having the GStreamer core and plugin packages should suffice
    • Others: drop me a note with the command for your favourite distro, and I’ll add it here
  • Next, check out the code (by default, it will turn up in ~/gst/master)

    • $ curl https://cgit.freedesktop.org/gstreamer/gstreamer/plain/scripts/create-uninstalled-setup.sh | sh
    • Ignore the pointers to documentation that you see — they’re currently defunct
  • Now put the gst-uninstalled script somewhere you can get to it easily:

    • $ ln -sf ~/gst/master/gstreamer/scripts/gst-uninstalled ~/bin/gst-master
    • (the -master suffix for the script is important to how the script works)
  • Enter the uninstalled environment:

    • $ ~/bin/gst-master
    • (this puts you in the directory with all the checkouts, and sets up a bunch of environment variables to use your uninstalled setup – check with echo $GST_PLUGIN_PATH)
  • Time to build

    • $ ./gstreamer/scripts/git-update.sh
  • Take it out for a spin

    • $ gst-inspect-1.0 filesrc
    • $ gst-launch-1.0 playbin uri=file:///path/to/some/file
    • $ gst-discoverer-1.0 /path/to/some/file
  • That’s it! Some tips:

    • Remember that you need to run ~/bin/gst-master to enter the environment for each new shell
    • If you start up a GStreamer app from your system in this environment, it will use your uninstalled libraries and plugins
    • You can and should periodically update you tree by rerunning the git-update.sh script
    • To run gdb on gst-launch, you need to do something like:
    • $ libtool --mode=execute gdb --args gstreamer/tools/gst-launch-1.0 videotestsrc ! videoconvert ! xvimagesink
    • I find it useful to run cscope on the top-level tree, and use that for quick code browsing

Update: Fixed create-uninstalled.sh link to use https (thanks to Victor for pointing this out).