Tag: collabora

Notes from the Prague Audio BoFs

As I’d blogged about last week, we had a couple of Audio BoF sessions last week. Here’s a summary of what was discussed. I’ve collected items in relevance order rather than chronological order to make for easier reading. I think I have everything covered, I’ll update this post if one of the attendees points out something I missed or got wrong.

  • Surround: There were a number of topics that came up with regards to multichannel/surround support:

    • There seems to be some lack of uniformity of channel maps, particularly in non-HDA hardware. While it was agreed that this should be fixed, we need some directed testing and bug reports to actually be able to fix this.

    • Multichannel mixers, while theoretically supported, are not actually exposed by any current drivers. It was suggested that these could be exposed without breaking existing applications by having new MC mixers exposed with device names corresponding to the surround PCM device (like “surround51”).

    • We need a way to query channel maps on a given PCM. This will be implemented as a new ALSA API which could be called after the PCM is opened. (AI: Takashi)

    • It would be good to have a way to configure the channel map as well (if supported by the hardware?). The suggestion was to do this as was done in PulseAudio, where an arbitrary channel map could be specified. (NB: is there hardware that supports multiple/arbitrary channel maps? If yes, how do we handle this?)

  • Routing: Unsurprisingly, we came back to the problem of building a simplified mixer graph for PulseAudio.

    • The current status is that ALSA builds a simplified mixer for use by userspace, and PulseAudio further simplifies this by means of some name-based guessing.

    • PulseAudio would ideally like a simplified version of the original mixer graph, but something more complete than what we get now

    • However, since PulseAudio has fairly unique requirements of what information it wants, it probably makes sense to have ALSA provide the entire graph and leave the simplification task to PulseAudio (discussion on this approach continues)

    • There was no consensus on who would do this or how this should be done (creating a new serialisation format, exposing what HDA provides, adding node metadata to ALSA mixer controls, or something else altogether)

    • As an interim step, it was agreed that it would be possible to provide ordering in the simplified ALSA mixer (that is, add metadata to the control to signal what control comes “before” it and what comes “after” it). This should go some way in making the PA mixer simplification logic simpler and more robust. (AI: Takashi)

  • HDMI: A couple of things came up in discussion about the status of HDMI.

    • There was a question about the reliability of ELD information as this will be used more in future versions of PulseAudio. There did not appear to be conclusive evidence in either direction, so we will probably assume that it is reliable and deal with reliability problems as they arise.

    • It was mentioned that it might be desirable to only expose the ALSA device if a receiver is plugged in. This had been mooted earlier as something to do in PulseAudio as an alternative. One thing to consider with this approach is dealing with a device switch on the receiver side. Doing this without a notification to userspace was generally agreed to be a bad idea.

  • Jack detection: The long-standing debate on exposing jacks as input devices or ALSA controls came to a conclusion, with the resolution being that jacks would be exposed as ALSA controls. This requires a change in the kernel (and potentially alsa-lib?) which should not be too complex. Actual buttons (such as volume/mute control) will continue to be input devices. Once this is done, the pending jack detection patches will be adapted to use the new interface. (AI: Takashi (patches are in a branch already!), David)

  • UCM: Another long-standing issue was the merging of the ALSA UCM patches into PulseAudio. Most of the problems thus far had been due to an incomplete understanding of how ALSA and PA concepts mapped to each other. Some consensus was arrived at in this regard after a lengthy discussion:

    • As is the case now, every ALSA PCM maps to a PA sink

    • Each UCM verb maps to a PA card profile

    • Each combination of UCM devices that can be used concurrently maps to a PA port

    • Each UCM modifier is mapped to an intended role on the corresponding sink

    The code should (as is in the patches currently submitted) be part of the PA ALSA module, and there will be changes required to use the UCM-specified mixer list instead of PA’s guessing mechanism. (AI: ???)

    (NB: It was mentioned that PulseAudio needs to support multiple intended roles for a sink/source. This is actually already supported — the intended roles property is a whitespace-separated list of roles)

    (NB2: There was further discussion with the Linaro folks this week about the UCM bits, and there’s likely going to be an IRC/phone gathering to clarify things further in the near future)

  • GStreamer latency settings: We currently do not actually use PulseAudio’s power saving features from GStreamer for several reasons. Suggestions to over come this were mooted. While no definite agreement was reached, one suggestion was to add a “powersave” profile to pulsesink that chose higher latency/buffer-time values. Players would need to set this if they are not using features that break when these values are raised.

  • Corking: The statelessness of current the corking mechanism was discussed in one session, and between the PulseAudio developers later. The problem is that we need to be able to track cork/uncork reasons more closely. This would give us more metadata that is needed to make policy decisions without breaking streams. Particularly, for example, if PA corks a music stream due to an incoming call, then the user plays, then pauses music, and then the call ends, we must not uncork the music stream. We intend to deal with this with two changes:

    • We need to add a per-cause cork/uncork request count

    • We need to associate a “generation” with cork/uncork requests, so certain conditions (such as user intervention) can bump the generation counter, and uncork requests corresponding to old cork requests will be ignored

    This will make it possible to track the various bits of state we need to do the right thing for cases like the one mentioned before.

So that’s that — lots of things discussed, lots of things to do! Thanks to everyone who came for participating.

PragueAudio

For those who are in Prague for GstConf, LinuxCon, ELCE, etc. — don’t forget we’ve a couple of interesting audio-related things happening:

If you’re around and interested, do drop in!

Alternate sample rates

I’ve just pushed a bunch of patches by Pierre-Louis Bossart that can have a pretty decent CPU/power impact. These introduce the concept of an “alternate sample rate”.

Currently, PulseAudio runs all your devices at a default sample rate, which is set to 44.1 kHz on most systems (this can be configured). All streams running at different sample rates are resampled to this sample rate. Pierre’s patches add an alternate sample rate that we try to switch to under certain circumstances if it means that we can save on resampling cost. This would happen if the stream uses exactly the alternate sample rate, or some integral-or-so multiple of it.

The default value for the alternate sample rate is 48 kHz. So if you’re playing a movie off a DVD where the audio track is typically a 48 kHz stream, and your card supports it, we switch to 48 kHz and avoid resampling altogether. Similarly, while making voice calls, common sample rates are 8, 16, and 32 kHz. These can be resampled to 48 kHz much faster than to 44.1 kHz.

Now for the big caveat — this won’t work if there’s any other stream connected to your sink/source. So if your music player is playing (or even paused) when you get that voip call, we can’t update the rate. This situation can probably be improved by at least allowing corked streams have their sample rate change (so having some random stream connected but not playing — I’m looking at you, Flash! — won’t block rate updates altogether). Hopefully we’ll get this fixed before this feature is released in PulseAudio 2.0.

Thanks to Pierre for all his work on this, and to my company, Collabora, for giving me some time for upstream work!

More conferences than you can shake a stick at

Prague is an interesting place to be at this time of the year — next week it’s playing host to the Real Time Linux Workshop. The week after that, we have the Kernel Summit, GStreamer Conference, Embedded Linux Conference Europe and LinuxCon Europe. I’m going to be at the last 3, and there’s some great audio stuff happening!

On the evening of Oct 23rd, we’re having an Audio BoF to discuss pending issues that cut across the stack — ALSA, PulseAudio, GStreamer and any other similar bits that people have complaints about.

Then there’s GstConf, where there are going to be a bunch of awesome talks. Also included is a talk by yours truly about recent developments in the PulseAudio world.

At some point during that week, possibly Oct 26th morning, plans are afoot to have an ALSA BoF to discuss the state and future of the HDA driver.

There are also rumours of excellent beer that will need to be scrupulously verified. ;)

It’s going to be a pretty exciting week!

1.w00t!

As Colin Guthrie reports, PulseAudio 1.0 is now out the door! There’s a lot of new things in the release, and we should be getting a much more regular release schedule going. Head over to the full release notes for more details.

A lot of people have contributed to this release and thanks to them all. Special props to Colin all the patch-herding, tireless help, and code ninjutsu!

p.s.: Gentoo packages are already available, of course. :)

LPC ho!

I’m going to be at the Linux Plumbers’ Conference next week, speaking about the things we’ve been doing to make passthrough audio on Linux kick ass.

If you’re around and interested, do drop by!

Hello … hello … hello!

I have a secret to confess. I’ve spent a great deal of time over the last few months talking to myself. I can’t say I haven’t enjoyed it — it turns out my capacity to entertain myself is far greater than initially suspected. But I hear you ask … why?

Here at Collabora, I’ve been building on Wim’s previous work on adding echo cancellation to PulseAudio. Thanks go to Intel for supporting us in continuing this work. Before too long, all this work will be trickling down to your favourite Linux distribution and all your friends will stop hating you.

First, a quick recap on what acoustic echo cancellation (AEC) is. If you already know this, you might want to skip this paragraph and the next. Say you’re on your laptop, and you receive a voice call from your friend. You don’t have a pair of headphones lying around, so you’re just going to use your laptop’s built-in speakers and mic. When your friend speaks, what she says is played out the speakers, but is also captured by the microphone and she gets to hear herself speak, albeit a short while (a few hundred milliseconds or more) later. This is called acoustic echo, and can be frustrating enough to make conversation nigh impossible. There are other types of echo for phone systems, but that’s not interesting to us at the moment.

This problem is common on pretty much all devices that you use to make phone calls. Astute readers will ask why they don’t actually face this problem on their phone. That’s because your phone (or, if you have a cheap phone, your phone company) has special software hidden away that removes the echo before sending your signal along to the other end. On laptops, which are general-purpose hardware, the job of echo cancellation is left to either your operating system (Windows XP onwards, for example) or your chat client (Skype, for example) to provide.

On Linux, we implement echo cancellation as a PulseAudio module (code-ninja Wim Taymans wrote this last year). We use the Speex DSP library to perform the actual echo cancellation. The code’s quite modular, so it’s not very hard to plug in alternate echo cancellers (we even include an alternate implementation, which isn’t quite as effective as Speex).

Recently, we plugged in some more bits from the Speex library to do noise suppression and digital gain control (so you can quit twiddling with your mic volume for the other end to be able to hear you). We also added a bunch of fixes to reduce CPU consumption significantly — this should be good enough to run on a netbook and reasonably recent ARM platforms.

While all this sounds nice, I think a demo would sound (haha!) nicer …

Without AEC: /downloads/pulseaudio/aec/call-no-aec (or download ogg, aac)

With AEC: /downloads/pulseaudio/aec/call-with-aec (or download ogg, aac)

This is a recording of a call between my laptop and N900. The laptop is playing audio out the speakers and recording with the built-in mic. What you hear is the conversation as heard on the N900.

All this echo cancelling goodness will come to a Linux distribution near you in the upcoming 1.0 release of PulseAudio. The next version of the GNOME IM client, Empathy (3.2), will actually make use of this functionality. In due time, we intend to make it so that all voice applications will end up using this functionality (so if you’re writing a VoIP application and don’t want to use this functionality, you need to set a special stream property to disable this — filter.suppress="echo-cancel").

For the impatient among you, you can try all this out by getting recent testing versions of PulseAudio (I know packages are available for Ubuntu, Debian, Gentoo and Mageia at least). To force your phone streams to use echo cancellation, just run pactl load-module module-echo-cancel, and you’re done.

There’s still some work to be done, refining quality and using other AEC implementations (in the short-term, the WebRTC one looks promising). Things don’t work at all if you’re using different devices for playback and capture (e.g. laptop speakers and webcam mic). These are things that will be addressed in coming weeks and months.

Desktop Summit 2011

I’m in Berlin at the Desktop Summit, so you can drop me a note and we can meet if you want to yell about PulseAudio things that annoy you (or even, y’know, things you like).

I'm at Desktop Summit 2011

Updates from the Rygel + DLNA world

Things have been awfully quiet since Zeeshan’s posted about the work we’ve been doing on DLNA support in Rygel. Since I’ve released GUPnP DLNA 0.3.0, I thought this is a good time to explain what we’ve been up to. This is also a sort of expansion of my Lightning Talk from GUADEC, since 5 minutes weren’t enough to establish all the background I would have liked to.

For those that don’t know, the DLNA is a consortium that aims to standardise how various media devices around your house communicate with each other (that is, your home theater, TV, laptop, phone, tablet, …). One piece of this problem is having a standard way of identifying the type of a file, and communicating this between devices. For example, say your laptop (MediaServer in DLNA parlance) is sharing the movies you’ve got with your TV (MediaPlayer), and your TV can play only upto 720p H.264-encoded video. When the MediaServer is sharing files, it needs to provide sufficient information about the file so that the MediaPlayer knows whether it can play it or not, so that it can be intelligent about what files show up in its UI.

How the DLNA specification achieves this is by using “profiles”. For each media format supported by the DLNA specification, a number of profiles are defined, that identify the audio/video codec used, the container, and (in a sense) the complexity of decoding the file. (for multimedia geeks, that translates to things like the codec profile, resolution, framerate/samplerate, bitrate, etc.)

For example, if a file is indicated to be of a DLNA profile named AAC_ISO_320, this indicates that this is an audio file encoded with the AAC codec, contained in an MP4 container (that’s “ISO”), with a bitrate of at most 320 kbps. Similarly, a file with profile AVC_MP4_MP_SD_MPEG1_L3 represents a file with H.264 (a.k.a. AVC) video coded in the H.264 Main Profile at specific resolutions upto 720×576, MP3 audio, in an MP4 container (there are more restrictions, but I don’t want to swamp you with details).

So now we have a problem statement – given a media file, we need to get the corresponding DLNA profile. It’s easiest to break this problem into 3 pieces:

  1. Discovery: First we need to get all the metadata that the DLNA specification requires us to check. Using GStreamer and Edward’s gst-convenience library, getting the metadata we needed was reasonably simple. Where the metadata wasn’t available (mostly codec profiles and bitrate), I’ve tried to expose the required data from the corresponding GStreamer plugin.

  2. DLNA Profiles: I won’t rant much about the DLNA specification, because that’s a whole series of blog posts in itself, but the spec is sometimes overly restrictive and doesn’t support a number of popular formats (Matroska, AVI, DivX, OGG, Theora). With this in mind, we decided that it would be nice to have a generic way to store the constraints specified by the DLNA specification and use them in our library. We chose to store the profile constraints in XML files. This allows non-programmers to tweak the profile data when their devices resort to non-standard methods to work around the limitations of the DLNA spec.

  3. Matching: With 1. and 2. above in place, we just need some glue code to take the metadata from discovery and match it with the profiles loaded from disk. For the GStreamer hackers in the audience, the profile storage format we chose looks suspiciously like serialized GstCaps, so matching allows us to reuse some GStreamer code. Another advantage of this will be revealed soon.

So there you have it folks, this covers the essence of what GUPnP DLNA does. So what’s next?

  1. Frankie Says Relax: Since the DLNA spec can often be too strict about what media is supported, we’ve decided to introduce a soon-to-come “relaxed mode” which should make a lot more of your media match some profile.

  2. I Can Haz Trancoding: While considering how to store the DLNA profiles loaded from the XML on disk, we chose to use GstEncodingProfiles from the gst-convenience library since the restrictions defined by the DLNA spec closely resemble the kind of restrictions you’d expect to set while encoding a file (codec, bitrate, resolution, etc. again). One nice fallout of this is that (in theory), it should be easy to reuse these to transcode media that doesn’t match any profile (the encodebin plugin from gst-convenience makes this a piece of cake). That is, if GStreamer can play your media, Rygel will be able to stream it.

Apart from this, we’ll be adding support for more profiles, extending the API as more uses arise, adding more automated tests, and on and on. If you’re interested in the code, check out (sic) the repository on Gitorious.

GUADEC 2010 :(

Hopefully that title was provocative enough. ;) No, GUADEC seemed to be a smashing success. If only I had been able to attend instead of lying in bed for 2 days, ill and wondering at the general malignancy of a Universe that would do this to me.

Collabora Multimedians, looking for a canal

Nevertheless, I had a great time meeting all the cool folks at Collabora Multimedia at our company meeting. Managed to trundle out for my Rygel + DLNA lightning talk (more updates on this in a subsequent post). Things did get better subsequently, and I had an amazing week-long vacation in Germany, and now I’m back at home with my ninja skillz fully recharged!