As we approach the PulseAudio 9.0 release, I thought it would be a good time to talk about one of the things I had a chance to work on, that landed in this cycle.
Old-time readers will remember the work I had done in the past on echo cancellation. If you’re unfamiliar with the concept, imagine a situation where you’re making a call from your phone or laptop. You don’t have a headset, so you use your device’s speaker and microphone. Now when the person on the other end speaks, their voice is played out of your speaker and captured by your mic. This means that they can now also hear what they’re saying, with some lag — this is called echo. If this has happened to you, you know how annoying and disruptive it can be.
Using Acoustic Echo Cancellation (AEC), PulseAudio is able to detect this in the captured input, and remove the audio we recently played back. While doing this, we also run some other algorithms to enhance the captured input, such as noise suppression (great at damping out background and fan noise) and acoustic gain control, or AGC, which adjusts the mic volume so you are clearly audible). In addition to voice call use cases, this is also handy to have in other applications such as speech recognition (where you want the device to detect what a user is saying, while possibly playing out other sounds).
We don’t implement these algorithms ourselves in PulseAudio. The echo cancellation module — cunningly named
module-echo-cancel — provides the infrastructure to plug in different echo canceller implementations. One of these that we support (and recommend), is based on Google’s [WebRTC.org] implementation which includes an extremely capable set of voice processing algorithms.
This is a large code-base, intended to support a full real-time communication stack, and we didn’t want to pick up all that code to include in PulseAudio. So what I did was to make a copy of the
AudioProcessing module, wrap it in an easy-to-package library, and then used that from PulseAudio. Quite some time passed by, and I didn’t get a chance to update that code, until last October.
The update brought us a number of things since the last one (5 years ago!):
The AGC module has essentially been rewritten. In practice, we see that it is slower to change the volume.
Voice Activity Detection (VAD) has also been split off into its own module and undergone significant changes.
Beamforming has been added, to allow you to use a set of microphones to be able to “point” your microphone array in a specific direction (more on this in a later post).
There is now an intelligibility enhancer for applying processing on the stream coming in from the far end (so you can hear the other side better). This feature has not been hooked up in PulseAudio yet.
There is a transient suppressor for when you’re on a laptop, and your microphone is picking up keystrokes. This can be important since the sound of the keystroke introduces sharp spikes or “transients” in the audio stream, which can throw off the echo canceller that works best with the frequency range of the human voice. This one seems to be work-in-progress, and not actually used yet.
In addition to this, I’ve also extended support in
module-echo-cancel for performing cancellation on multiple channels. So we are now able to deal with hardware that has any number of playback and capture channels (and they don’t even need to be equal), and we no longer have the artificial restriction of having to downmix things to mono.
These changes are in the newly released webrtc-audio-processing v0.2. Unfortunately, we do break API with regards to the previous version. I wrote about this a while back, and hopefully the impact on other users of this library will be minimal.
All this work was made possible thanks to Aldebaran Robotics. A special shout-out to Julien Massot and his excellent team!
These features are already in our master branch, and will be part of the 9.0 release. If you’re using these features, let me know how things work for you, and watch out for a follow up post about beamforming.
If you or your company are looking for help with either PulseAudio or GStreamer, do take a look at the consulting services I currently provide.
May 20, 2016 — 11:35 am
Looking forward to the follow up post.
May 20, 2016 — 11:38 pm
Very cool stuff! Thanks for the details. So what happens if they already implement echo cancellation at the application level, such as in Skype and Google Hangouts?
May 20, 2016 — 11:52 pm
In the default setup, you get echo cancellation if your application explicitly requests it by setting the
filter.wantproperty on its stream to
echo-cancel. That means it’s opt-in, and Google Hangouts or Skype won’t use it. Empathy does, fwiw.
July 27, 2016 — 9:34 pm
Nice work! Apart from the beamforming, is the webrtc AEC in PulseAudio 9 much better than the same in PA 8?
July 27, 2016 — 9:36 pm
I didn’t do a direct comparison. I don’t think you should see a dramatic difference in the output, maybe some improvements to robustness.
October 14, 2017 — 3:02 pm
Hello Arun, I found strange problem with echo in WebRTC/pulseaudio https://github.com/jitsi/jitsi-meet/issues/2069 I can not understand why it is not configured in pulseaudio by default? Or may be it is debian specific configuration?