Category: Blog

LPC ho!

I’m going to be at the Linux Plumbers’ Conference next week, speaking about the things we’ve been doing to make passthrough audio on Linux kick ass.

If you’re around and interested, do drop by!

Hello … hello … hello!

I have a secret to confess. I’ve spent a great deal of time over the last few months talking to myself. I can’t say I haven’t enjoyed it — it turns out my capacity to entertain myself is far greater than initially suspected. But I hear you ask … why?

Here at Collabora, I’ve been building on Wim’s previous work on adding echo cancellation to PulseAudio. Thanks go to Intel for supporting us in continuing this work. Before too long, all this work will be trickling down to your favourite Linux distribution and all your friends will stop hating you.

First, a quick recap on what acoustic echo cancellation (AEC) is. If you already know this, you might want to skip this paragraph and the next. Say you’re on your laptop, and you receive a voice call from your friend. You don’t have a pair of headphones lying around, so you’re just going to use your laptop’s built-in speakers and mic. When your friend speaks, what she says is played out the speakers, but is also captured by the microphone and she gets to hear herself speak, albeit a short while (a few hundred milliseconds or more) later. This is called acoustic echo, and can be frustrating enough to make conversation nigh impossible. There are other types of echo for phone systems, but that’s not interesting to us at the moment.

This problem is common on pretty much all devices that you use to make phone calls. Astute readers will ask why they don’t actually face this problem on their phone. That’s because your phone (or, if you have a cheap phone, your phone company) has special software hidden away that removes the echo before sending your signal along to the other end. On laptops, which are general-purpose hardware, the job of echo cancellation is left to either your operating system (Windows XP onwards, for example) or your chat client (Skype, for example) to provide.

On Linux, we implement echo cancellation as a PulseAudio module (code-ninja Wim Taymans wrote this last year). We use the Speex DSP library to perform the actual echo cancellation. The code’s quite modular, so it’s not very hard to plug in alternate echo cancellers (we even include an alternate implementation, which isn’t quite as effective as Speex).

Recently, we plugged in some more bits from the Speex library to do noise suppression and digital gain control (so you can quit twiddling with your mic volume for the other end to be able to hear you). We also added a bunch of fixes to reduce CPU consumption significantly — this should be good enough to run on a netbook and reasonably recent ARM platforms.

While all this sounds nice, I think a demo would sound (haha!) nicer …

Without AEC: /downloads/pulseaudio/aec/call-no-aec (or download ogg, aac)

With AEC: /downloads/pulseaudio/aec/call-with-aec (or download ogg, aac)

This is a recording of a call between my laptop and N900. The laptop is playing audio out the speakers and recording with the built-in mic. What you hear is the conversation as heard on the N900.

All this echo cancelling goodness will come to a Linux distribution near you in the upcoming 1.0 release of PulseAudio. The next version of the GNOME IM client, Empathy (3.2), will actually make use of this functionality. In due time, we intend to make it so that all voice applications will end up using this functionality (so if you’re writing a VoIP application and don’t want to use this functionality, you need to set a special stream property to disable this — filter.suppress="echo-cancel").

For the impatient among you, you can try all this out by getting recent testing versions of PulseAudio (I know packages are available for Ubuntu, Debian, Gentoo and Mageia at least). To force your phone streams to use echo cancellation, just run pactl load-module module-echo-cancel, and you’re done.

There’s still some work to be done, refining quality and using other AEC implementations (in the short-term, the WebRTC one looks promising). Things don’t work at all if you’re using different devices for playback and capture (e.g. laptop speakers and webcam mic). These are things that will be addressed in coming weeks and months.

Desktop Summit 2011

I’m in Berlin at the Desktop Summit, so you can drop me a note and we can meet if you want to yell about PulseAudio things that annoy you (or even, y’know, things you like).

I'm at Desktop Summit 2011

More PulseAudio power goodness

[tl;dr — if you’re using GNOME or a GStreamer-based player, not using the Rhythmbox crossfading backend, and want to try to save ~0.5 W of power, jump to end of the post]

Lennart pointed to another blog post about actually putting PulseAudio’s power-saving capabilities to use on your system. The latter provides a hack-ish way to increase buffering in PulseAudio to the maximum possible, reducing the number of wakeups. I’m going to talk about that a bit.

Summarising the basic idea, we want music players to decode a large chunk of data and give it to PA so that we can then fill up ALSA’s hardware buffer, sleep till it’s almost completely consumed, fill it again, sleep, repeat. More details in this post from Lennart.

The native GNOME audio/video players don’t talk to PulseAudio directly — they use GStreamer, which has a pulsesink element that actually talks to PulseAudio. We could configure things so that we send a large amount (say 2 seconds’ worth) to PulseAudio, sleep, and then wake up periodically to push out more. Now in the audio player (say Rhythmbox), the user hits next, prev, or pause. We need to effect this change immediately, even though we’ve already sent out 2 seconds of data (it would suck if you hit pause and the actual pause happened 2 seconds later, wouldn’t it?). PulseAudio already solves because it can internally “rewind” the buffer and overwrite it if required. GStreamer can and does take advantage of this by sending pause and other control messages out of band from the data.

This all works well for relatively simple GStreamer pipelines. However, if you want to do something more complicated, like Rhythmbox’ crossfading backend, things start to break. PulseAudio doesn’t offer an API to do fades, and since we don’t do rewinds in GStreamer, we need to apply effects such as fades with a latency equal to the amount of buffering we’re asking PulseAudio to do. This makes for unhappy users.

Well, all is not as bleak as it seems. There was some discussion on the PA mailing list, and the need for a proper fade API (really, a generic effects API) is clear. There have even been attempts to solve this in GStreamer.

But you want to save 0.5 W of power now! Okay, if you’re not using the Rhythmbox crossfading backend (or are okay with disabling it), this will make Rhythmbox, Banshee, pre-3.0 Totem (and really any GNOMEy player that uses gconfaudiosink, which will soon be replaced by gsettingsaudiosink, I guess), you can run this on the command line:

gconftool-2 --type string \
    --set /system/gstreamer/0.10/default/musicaudiosink \
    "pulsesink latency-time=100000 buffer-time=2000000"

On my machine, this brings down the number of wakeups per second because of alsa-sink to ~2.7 (corresponding nicely to the ~350ms of hardware buffer that I have). With Totem 3.0, this may or may not work, depending on whether your distribution gives gconfaudiosink a higher rank than pulseaudiosink.

This is clearly just a stop-gap till we can get things done the Right Way™ at the system level, so really, if things break, you get to keep the pieces. If you need to, you can undo this change by running the same command without the latency-time=… and buffer-time=… bits. That said, if something does break, do leave a comment below so I can add it to the list of things that we need to test the final solution with.

GNOME Asia 2011

Just a quick (and late!) heads-up for all of you who missed it — the GNOME Asia Summit 2011 is happening in Bangalore this week, with a bunch of really cool people doing hackfests through the week, and whole bunch of talks on Saturday and Sunday (April 2nd and 3rd).

I’ll be presenting a talk titled DLNA in a GNOME 3 World, talking about Rygel and the work we’ve been doing on gupnp-dlna to make DLNA rock on GNOME.

If you’re in or around Bangalore and contribute to or are interested in contributing to GNOME, you really have no excuse to not attend (heck, entry’s free). This applies doubly to students who are looking for cool stuff to do for the Google Summer of Code this year. So, do drop by and say hello! :)

George Orwell on literature and intellectual honesty

If you find yourself saying tl;dr very often, you should probably stop reading now.

Madhu, being the awesome cousin that she is, sent me Books v. Cigarettes, a while ago. It’s an anthology of assorted George Orwell articles and musings, amongst which is The Prevention Of Literature — a powerful essay about the function of intellectual honesty in society and its impact on literature. Makes for a brilliant read and got me wondering about how this applies today.

I have no idea about the state of Chinese literature, but I can’t help but believe that exactly the sort of intellectual repression that he talks about must be playing a large part in the killing of Indian literature as well. This is, my opinion from extremely limited reading of Indian writing in English, but I hear similar complaints from friends who read Hindi literature too.

The mass media are a laugh riot of dishonesty, and I know of no real reporting counter-culture, underground or otherwise (the closest that I’m aware of is Kafila, but the authors there seem to be foaming-at-the-mouth more often than not … meh).

So what is one to do?

Footnote: Guess what turned up on Kafila today.

GNOME3 Power Settings

Richard Hughes recently posted about the recent GNOME3 Power Settings design that got a lot of people (myself included) hot and bothered. As I said in my comment, I think that a lot of people prefer that their laptop stay on when the lid is closed. There are clearly other who, like myself, would prefer to maintain the normal behaviour when an external monitor is plugged in.

So Nirbheek Chauhan and I designed a couple of quick mockups that I think would work well. This doesn’t address customising behaviour with an external monitor, but I don’t feel nearly as strongly about that being hidden in dconf-editor as I do about the rest.

My mockup

Nirbheek's mockup

While Nirbheek’s version looks decidedly prettier, I think the meaning of the icons is not absolutely obvious. This might be solvable by some explanatory text above and mouse-overs.

While doing all this, though, it’s clear that it is really hard to design a UI that you think will please enough people, and really easy to make assumptions about what “people” want and how they use their computers. So kudos to the GNOME3 UI designers for taking up this difficult job and I hope they take all the feedback flying around in a positive spirit (even if the messages are often not quite positive-sounding ;) )

A Bibliophile’s Review of the Amazon Kindle

When it comes to books I’m really old school. Starting from the pleasure of discovering a book you’ve been dying to find, nestled between two otherwise forgettable books in the store, to the crinkling goodness of a new book, the reflexive care to not damage the spine unduly, inscriptions from decades past in second-hand books, the smell, the texture, everything. And don’t even get me started on the religious experience of visiting your favourite libraries. Stated another way, e-books are just fundamentally incompatible with my reading experience.

That is, until I had to move houses last year. It is not a pleasant experience to have to cart around a few hundred books, even within the same city. This, and the fact that some Dan McGirt books that I’ve wanted to read are only really available to me in e-book form finally pushed me to actually buy the Amazon Kindle.

My black Kindle 3G (3rd rev.)

My precioussss

About 3 months ago, I got a black Kindle 3G (the 3rd revision). Technical reviews abound, so I’m not going to talk about the technology much. I didn’t see any articles that really spoke about using it, which is far more relevant to potential buyers (I’m sure they’re there, I didn’t find any good ones is all). So this is my attempt at describing the bits of the Kindle experience that are relevant to others of my ilk (the ones who nodded along to the first paragraph, especially :D).

The Device

We’ll I’m a geek, I can’t avoid talking about the technology completely, but I’ll try to keep it to a minimum (also, it runs Linux, woohoo! :D ed: and GStreamer too, as Sebastian Dröge points out!).

I bought the Kindle with the 6″ display and free wireless access throughout the world (<insert caveat about coverage maps here>). The device itself is really slick, the build quality is good. They keys on the keyboard feel hard to press, but this is presumably intentional, because you don’t want to randomly press keys while handling the device.

At first glance, the e-ink display on the new device is brilliant, the contrast in daylight is really good (more about this later). It’s light, and fairly easy to use (but I have a really high threshold for complex devices, so don’t take my word for it). The 3G coverage falls back to 2G mode in India. I’ve tried it around a bit in India, and the connectivity is pretty hit-or-miss. Maybe things will change for the better with the impending 3G rollout.

The battery life is either disappointing or awesome, depending on whether you’ve got wireless enabled or not. This is a bit of a nag, but you quickly get used to just switching off the wireless when you’re done shopping or browsing.

Reading

Obviously the meat and drink of this device is the reading experience. It is not the same as reading a book. There are a lot of small, niggling differences that will keep reminding you that you’re not reading a book, and this is something you’re just going to have to accept if you’re getting the device.

Firstly the way you hold the device is going to be different from holding a book. I generally hold a book along the spine with one hand, either at the top or bottom (depending on whether I’m sitting, lying down, etc.). You basically cannot hold the Kindle from above — there isn’t enough room. I alternate between holding the device on my palm (but it’s not small enough to hold comfortably like that, your mileage will vary depending on the size of your hand), grasping it between my fingers around the bottom left or right edge (this is where the hard keys on the keyboard help — you won’t press a key by mistake in this position), or I just rest the Kindle on a handy surface (table or lap while sitting, tummy while supine :) ).

Secondly, the light response of the device is very different from books. Paper is generally not too picky about the type of lighting (whiteness, diffused or direct, etc.) In daylight, the Kindle looks like a piece of white paper with crisp printing, which is nice. However, at night, it depends entirely on the kind of lighting you have. My house has mostly yellow-ish fluorescent lamps, so the display gets dull unless the room is very well lit. I also find that the contrast drops dramatically if the light source is not behind you (diffuse lighting might not be so great, in other words). There are some angles at which the display reflects lighting that’s behind/above you, but it’s not too bad.

The fonts and spacing on the Kindle are adjustable and this is one area in which it is hard to find fault with the device. Whatever your preference is in print (small fonts, large fonts, wide spacing, crammed text), you can get the same effect across all your books.

Flipping pages looks annoying when you see videos of the Kindle (since flipping requires a refresh of the whole screen), but in real life it’s fast enough to not annoy.

The Store

I’ve only used the Kindle Store from India, and in a word, it sucks. The number of books available is rubbish. I don’t care if they have almost(?) a million books, but if they don’t have Good Omens, Cryptonomicon, or most of Asimov’s Robot series, they’re fighting a losing battle as far as I’m concerned (these are all books that I’ve actually wanted to read/re-read since I got the Kindle).

When I do find a book I want, the pricing is inevitably ridiculous. I do not see what the publishers are smoking, but could someone please tell them that charging more than 2 times the price of a paperback for an e-book is just plain stupid? Have they learned nothing from the iTunes story? Speaking of which, the fact that the books I buy are locked by DRM to Kindle devices is very annoying.

While the reading experience is something I can get used to, this is the biggest problem I currently have. From my perspective, books have been the last bastion of purity where piracy is not the only available solution to work around the inability of various industry middlemen to find a reasonable way to deal with the Internet and it’s impact on creative content. I am really hoping that Amazon will get enough muscle soon to pull an Apple on the book industry and get the pricing to reasonable levels. And possibly go one step further and break down country-wise barriers. Otherwise, we’re just going to have to deal with another round of rampant piracy and broken systems to try to curb it.

(Editor’s note: This bit clearly bothers me a lot and deserves a blog post of its own, but let’s save that for another day)

The Ecosystem

A lot of my family and friends love reading books, and a large number of the books I buy go through many hands before finding their final resting place on my shelf. This is not just a matter of cost — there is a whole ecosystem of sharing your favourite books with like-minded people, discussing, and so on.

The Kindle device itself isn’t conducive to sharing (if I’m reading a book on the device, nobody else can use the device, obviously). Interestingly Amazon has recently introduced the idea of sharing books from the Kindle (something Barnes and Noble has had for a while). You can share books you’ve bought off the Kindle Store with someone else with a Kindle account, once, for a period of 2 weeks. This in itself is a really lame restriction, but even something more relaxed would be useless to me. Almost nobody I know has a device that supports the Kindle software (phones and laptops/desktops do not count as far as I am concerned).

So in my opinion, the complete break from the reading ecosystem is a huge negative for the Kindle experience. When I know I’m going to want to lend a book to someone, I immediately eliminate the possibility of buying it off the Kindle Store. This is true of all e-books, of course, and might become less of an issue in decades to come, but it is a real problem today.

Other Fluff

The Kindle comes with support for MP3s, browsing the Internet and some games (some noises about an app store have also been made). These are just fluff — I don’t care if my reading device has any of these things. Display technology is still quite far from getting to a point where convergence is possible without compromising the reading experience (yes, I’m including the Pixel Qi display in this assertion, but my opinion is only based on the several videos of devices using these displays).

The Verdict

Honestly, it’s not clear to me whether the Kindle is a keeper or not. It’s definitely a very nice device, technically. I think it’s possible for Amazon to improve the reading experience — I’m sure the display technology will get better with regards to response to different kinds of lighting. Some experimentation with design to make it work with standard reading postures would be nice too. The Kindle Store is a disaster for me, and I really hope Amazon and the publishing industry get their act together.

Maybe this article will be helpful to potential converts out there. If you’ve got questions about the Kindle or anything to add that I’ve missed, feel free to drop a comment.

Updates from the Rygel + DLNA world

Things have been awfully quiet since Zeeshan’s posted about the work we’ve been doing on DLNA support in Rygel. Since I’ve released GUPnP DLNA 0.3.0, I thought this is a good time to explain what we’ve been up to. This is also a sort of expansion of my Lightning Talk from GUADEC, since 5 minutes weren’t enough to establish all the background I would have liked to.

For those that don’t know, the DLNA is a consortium that aims to standardise how various media devices around your house communicate with each other (that is, your home theater, TV, laptop, phone, tablet, …). One piece of this problem is having a standard way of identifying the type of a file, and communicating this between devices. For example, say your laptop (MediaServer in DLNA parlance) is sharing the movies you’ve got with your TV (MediaPlayer), and your TV can play only upto 720p H.264-encoded video. When the MediaServer is sharing files, it needs to provide sufficient information about the file so that the MediaPlayer knows whether it can play it or not, so that it can be intelligent about what files show up in its UI.

How the DLNA specification achieves this is by using “profiles”. For each media format supported by the DLNA specification, a number of profiles are defined, that identify the audio/video codec used, the container, and (in a sense) the complexity of decoding the file. (for multimedia geeks, that translates to things like the codec profile, resolution, framerate/samplerate, bitrate, etc.)

For example, if a file is indicated to be of a DLNA profile named AAC_ISO_320, this indicates that this is an audio file encoded with the AAC codec, contained in an MP4 container (that’s “ISO”), with a bitrate of at most 320 kbps. Similarly, a file with profile AVC_MP4_MP_SD_MPEG1_L3 represents a file with H.264 (a.k.a. AVC) video coded in the H.264 Main Profile at specific resolutions upto 720×576, MP3 audio, in an MP4 container (there are more restrictions, but I don’t want to swamp you with details).

So now we have a problem statement – given a media file, we need to get the corresponding DLNA profile. It’s easiest to break this problem into 3 pieces:

  1. Discovery: First we need to get all the metadata that the DLNA specification requires us to check. Using GStreamer and Edward’s gst-convenience library, getting the metadata we needed was reasonably simple. Where the metadata wasn’t available (mostly codec profiles and bitrate), I’ve tried to expose the required data from the corresponding GStreamer plugin.

  2. DLNA Profiles: I won’t rant much about the DLNA specification, because that’s a whole series of blog posts in itself, but the spec is sometimes overly restrictive and doesn’t support a number of popular formats (Matroska, AVI, DivX, OGG, Theora). With this in mind, we decided that it would be nice to have a generic way to store the constraints specified by the DLNA specification and use them in our library. We chose to store the profile constraints in XML files. This allows non-programmers to tweak the profile data when their devices resort to non-standard methods to work around the limitations of the DLNA spec.

  3. Matching: With 1. and 2. above in place, we just need some glue code to take the metadata from discovery and match it with the profiles loaded from disk. For the GStreamer hackers in the audience, the profile storage format we chose looks suspiciously like serialized GstCaps, so matching allows us to reuse some GStreamer code. Another advantage of this will be revealed soon.

So there you have it folks, this covers the essence of what GUPnP DLNA does. So what’s next?

  1. Frankie Says Relax: Since the DLNA spec can often be too strict about what media is supported, we’ve decided to introduce a soon-to-come “relaxed mode” which should make a lot more of your media match some profile.

  2. I Can Haz Trancoding: While considering how to store the DLNA profiles loaded from the XML on disk, we chose to use GstEncodingProfiles from the gst-convenience library since the restrictions defined by the DLNA spec closely resemble the kind of restrictions you’d expect to set while encoding a file (codec, bitrate, resolution, etc. again). One nice fallout of this is that (in theory), it should be easy to reuse these to transcode media that doesn’t match any profile (the encodebin plugin from gst-convenience makes this a piece of cake). That is, if GStreamer can play your media, Rygel will be able to stream it.

Apart from this, we’ll be adding support for more profiles, extending the API as more uses arise, adding more automated tests, and on and on. If you’re interested in the code, check out (sic) the repository on Gitorious.

GUADEC 2010 :(

Hopefully that title was provocative enough. ;) No, GUADEC seemed to be a smashing success. If only I had been able to attend instead of lying in bed for 2 days, ill and wondering at the general malignancy of a Universe that would do this to me.

Collabora Multimedians, looking for a canal

Nevertheless, I had a great time meeting all the cool folks at Collabora Multimedia at our company meeting. Managed to trundle out for my Rygel + DLNA lightning talk (more updates on this in a subsequent post). Things did get better subsequently, and I had an amazing week-long vacation in Germany, and now I’m back at home with my ninja skillz fully recharged!