Tag: gstreamer

December 28, 2016

Synchronised Playback and Video Walls

Hello again, and I hope you’re having a pleasant end of the year (if you are, maybe don’t check the news until next year).

I’d written about synchronised playback with GStreamer a little while ago, and work on that has been continuing apace. Since I last wrote about it, a bunch of work has gone in:

Landed support for sending a playlist to clients (instead of a single URI)
Added the ability to start/stop playback
The API has been cleaned up considerably to allow us to consider including this upstream
The control protocol implementation was made an interface, so you don’t have to use the built-in TCP server (different use-cases might want different transports)
Made a bunch of robustness fixes and documentation
Introduced API for clients to send the server information about themselves
Also added API for the server to send video transformations for specific clients to apply before rendering

While the other bits are exciting in their own right, in this post I’m going to talk about the last two items.

Video walls

For those of you who aren’t familiar with the term, a video wall is just an array of displays stacked to make a larger display. These are often used in public installations.

One way to set up a video wall is to have each display connected to a small computer (such as the Raspberry Pi), and have them play a part of the entire video, cropped and scaled for the display that is connected. This might look something like:

A 4×4 video wall

The tricky part, of course, is synchronisation — which is where gst-sync-server comes in. Since we’re able to play a given stream in sync across devices on a network, the only missing piece was the ability to distribute a set of per-client transformations so that clients could apply those, and that is now done.

In order to keep things clean from an API perspective, I took the following approach:

Clients now have the ability to send a client ID and a configuration (which is just a dictionary) when they first connect to the server
The server API emits a signal with the client ID and configuration, which allows you to know when a client connects, what kind of display it’s running, and where it is positioned
The server now has additional fields to send a map of client ID to a set of video transformations

This allows us to do fancy things like having each client manage its own information with the server dynamically adapting the set of transformations based on what is connected. Of course, the simpler case of having a static configuration on the server also works.

Demo

Since seeing is believing, here’s a demo of the synchronised playback in action:

The setup is my laptop, which has an Intel GPU, and my desktop, which has an NVidia GPU. These are connected to two monitors (thanks go out to my good friends from Uncommon for lending me their thin-bezelled displays).

The video resolution is 1920×800, and I’ve adjusted the crop parameters to account for the bezels, so the video actually does look continuous. I’ve uploaded the text configuration if you’re curious about what that looks like.

As I mention in the video, the synchronisation is not as tight than I would like it to be. This is most likely because of the differing device configurations. I’ve been working with Nicolas to try to address this shortcoming by using some timing extensions that the Wayland protocol allows for. More news on this as it breaks.

More generally, I’ve done some work to quantify the degree of sync, but I’m going to leave that for another day.

p.s. the reason I used kmssink in the demo was that it was the quickest way I know of to get a full-screen video going — I’m happy to hear about alternatives, though

Future work

Make it real

My demo was implemented quite quickly by allowing the example server code to load and serve up a static configuration. What I would like is to have a proper working application that people can easily package and deploy on the kinds of embedded systems used in real video walls. If you’re interested in taking this up, I’d be happy to help out. Bonus points if we can dynamically calculate transformations based on client configuration (position, display size, bezel size, etc.)

Hardware acceleration

One thing that’s bothering me is that the video transformations are applied in software using GStreamer elements. This works fine(ish) for the hardware I’m developing on, but in real life, we would want to use OpenGL(ES) transformations, or platform specific elements to have hardware-accelerated transformations. My initial thoughts are for this to be either API on playbin or a GstBin that takes a set of transformations as parameters and internally sets up the best method to do this based on whatever sink is available downstream (some sinks provide cropping and other transformations).

Why not audio?

I’ve only written about video transformations here, but we can do the same with audio transformations too. For example, multi-room audio systems allow you to configure the locations of wireless speakers — so you can set which one’s on the left, and which on the right — and the speaker will automatically play the appropriate channel. Implementing this should be quite easy with the infrastructure that’s currently in place.

Merry Happy `.`

I hope you enjoyed reading that — I’ve had great responses from a lot of people about how they might be able to use this work. If there’s something you’d like to see, leave a comment or file an issue.

Happy end of the year, and all the best for 2017!

November 4, 2016

GStreamer and Synchronisation Made Easy

A lesser known, but particularly powerful feature of GStreamer is our ability to play media synchronised across devices with fairly good accuracy.

The way things stand right now, though, achieving this requires some amount of fiddling and a reasonably thorough knowledge of how GStreamer’s synchronisation mechanisms work. While we have had some excellent talks about these at previous GStreamer conferences, getting things to work is still a fair amount of effort for someone not well-versed with GStreamer.

As part of my work with the Samsung OSG, I’ve been working on addressing this problem, by wrapping all the complexity in a library. The intention is that anybody who wants to implement the ability for different devices on a network to play the same stream and have them all synchronised should be able to do so with a few lines of code, and the basic know-how for writing GStreamer-based applications.

I’ve started work on this already, and you can find the code in the creatively named gst-sync-server repo.

Design and API

Let’s make this easier by starting with a picture …

Let’s say you’re writing a simple application where you have two ore more devices that need to play the same video stream, in sync. Your system would consist of two entities:

A server: this is where you configure what needs to be played. It instantiates a GstSyncServer object on which it can set a URI that needs to be played. There are other controls available here that I’ll get to in a moment.
A client: each device would be running a copy of the client, and would get information from the server telling it what to play, and what clock to use to make sure playback is synchronised. In practical terms, you do this by creating a GstSyncClient object, and giving it a playbin element which you’ve configured appropriately (this usually involves at least setting the appropriate video sink that integrates with your UI).

That’s pretty much it. Your application instantiates these two objects, starts them up, and as long as the clients can access the media URI, you magically have two synchronised streams on your devices.

Control

The keen observers among you would have noticed that there is a control entity in the above diagram that deals with communicating information from the server to clients over the network. While I have currently implemented a simple TCP protocol for this, my goal is to abstract out the control transport interface so that it is easy to drop in a custom transport (Websockets, a REST API, whatever).

The actual sync information is merely a structure marshalled into a JSON string and sent to clients every time something happens. Once your application has some media playing, the next thing you’ll want to do from your server is control playback. This can include

Changing what media is playing (like after the current media ends)
Pausing/resuming the media
Seeking
“Trick modes” such as fast forward or reverse playback

The first two of these work already, and seeking is on my short-term to-do list. Trick modes, as the name suggets, can be a bit more tricky, so I’ll likely get to them after other things are done.

Getting fancy

My hope is to see this library being used in a few other interesting use cases:

Video walls: having a number of displays stacked together so you have one giant display — these are all effectively playing different rectangles from the same video
Multiroom audio: you can play the same music across different speakers in a single room, or multiple rooms, or even group sets of speakers and play different media on different groups
Media sharing: being able to play music or videos on your phone and have your friends be able to listen/watch at the same time (a silent disco app?)

What next

At this point, the outline of what I think the API should look like is done. I still need to create the transport abstraction, but that’s pretty much a matter of extracting out the properties and signals that are part of the existing TCP transport.

What I would like is to hear from you, my dear readers who are interested in using this library — does the API look like it would work for you? Does the transport mechanism I describe above cover what you might need? There is example code that should make it easier to understand how this library is meant to be used.

Depending on the feedback I get, my next steps will be to implement the transport interface, refine the API a bit, fix a bunch of FIXMEs, and then see if this is something we can include in gst-plugins-bad.

Feel free to comment either on the Github repository, on this blog, or via email.

And don’t forget to watch this space for some videos and measurements of how GStreamer synchronised fares in real life!

September 6, 2016

GStreamer on Android and universal builds

This is a quick PSA for those of you using the GStreamer binary builds for Android.

With the Android NDK r12, the default behaviour while building native code changed from building for armeabi to building for all ABIs. So if your app doesn’t specify APP_ABI in its Application.mk, you will now get an error about unsupported architectures. This was tracked as bug 770631.

The idea behind this change is that your Android app should ship versions of your native code for all supported architectures as a “universal” build, so it is accessible to as many devices as possible.

To deal with this, we now provide a universal tarball which contains binaries for all archiectures that we support. This is currently ARM, ARMv7-A, ARMv8-A (64-bit), x86, and x86-64. That leaves MIPS and MIPS64 that are not currently supported.

If you’ve been using the GStreamer Android binaries before GStreamer 1.9.2, then you should start using the universal tarball rather than the architecture-specific tarball. You will need minor updates to your native build, like we made to the player example. You probably want to put the gstAndroidRoot variable in ~/.gradle/gradle.properties instead, though.

As Sebastian announced, assuming all goes well with the universal tarballs, we will stop shipping the per-arch tarballs — they are redundant, and just take up CI and disk resources.

There are some things that I’d like for us to be able to do better. The first is that Android Studio doesn’t pick up native code with our current build approach. This is a limitation of the Android Gradle NDK plugin, which doesn’t support a custom build. This should change with Android Studio 2.2.

I would also like to integrate better with Android Studio — either be able to specify the GStreamer Android binary path in the UI (like you do for the SDK/NDK), or better yet, have it be possible to specify the dependency in Gradle, and have it be automatically pulled from the Internet. If any of you are familiar with how we can do this, please shout out!

June 29, 2016

Beamforming in PulseAudio

In case you missed it — we got PulseAudio 9.0 out the door, with the echo cancellation improvements that I wrote about. Now is probably a good time for me to make good on my promise to expand upon the subject of beamforming.

As with the last post, I’d like to shout out to the wonderful folks at Aldebaran Robotics who made this work possible!

Beamforming

Beamforming as a concept is used in various aspects of signal processing including radio waves, but I’m going to be talking about it only as applied to audio. The basic idea is that if you have a number of microphones (a mic array) in some known arrangement, it is possible to “point” or steer the array in a particular direction, so sounds coming from that direction are made louder, while sounds from other directions are rendered softer (attenuated).

Practically speaking, it should be easy to see the value of this on a laptop, for example, where you might want to focus a mic array to point in front of the laptop, where the user probably is, and suppress sounds that might be coming from other locations. You can see an example of this in the webcam below. Notice the grilles on either side of the camera — there is a microphone behind each of these.

Webcam with 2 mics

This raises the question of how this effect is achieved. The simplest approach is called “delay-sum beamforming”. The key idea in this approach is that if we have an array of microphones that we want to steer the array at a particular angle, the sound we want to steer at will reach each microphone at a different time. This is illustrated below. The image is taken from this great article describing the principles and math in a lot more detail.

Delay-sum beamforming

In this figure, you can see that the sound from the source we want to listen to reaches the top-most microphone slightly before the next one, which in turn captures the audio slightly before the bottom-most microphone. If we know the distance between the microphones and the angle to which we want to steer the array, we can calculate the additional distance the sound has to travel to each microphone.

The speed of sound in air is roughly 340 m/s, and thus we can also calculate how much of a delay occurs between the same sound reaching each microphone. The signal at the first two microphones is delayed using this information, so that we can line up the signal from all three. Then we take the sum of the signal from all three (actually the average, but that’s not too important).

The signal from the direction we’re pointing in is going to be strongly correlated, so it will turn out loud and clear. Signals from other directions will end up being attenuated because they will only occur in one of the mics at a given point in time when we’re summing the signals — look at the noise wavefront in the illustration above as an example.

Implementation

(this section is a bit more technical than the rest of the article, feel free to skim through or skip ahead to the next section if it’s not your cup of tea!)

The devil is, of course, in the details. Given the microphone geometry and steering direction, calculating the expected delays is relatively easy. We capture audio at a fixed sample rate — let’s assume this is 32000 samples per second, or 32 kHz. That translates to one sample every 31.25 µs. So if we want to delay our signal by 125µs, we can just add a buffer of 4 samples (4 × 31.25 = 125). Sound travels about 4.25 cm in that time, so this is not an unrealistic example.

Now, instead, assume the signal needs to be delayed by 80 µs. This translates to 2.56 samples. We’re working in the digital domain — the mic has already converted the analog vibrations in the air into digital samples that have been provided to the CPU. This means that our buffer delay can either be 2 samples or 3, not 2.56. We need another way to add a fractional delay (else we’ll end up with errors in the sum).

There is a fair amount of academic work describing methods to perform filtering on a sample to provide a fractional delay. One common way is to apply an FIR filter. However, to keep things simple, the method I chose was the Thiran approximation — the literature suggests that it performs the task reasonably well~~, and has the advantage of not having to spend a whole lot of CPU cycles first transforming to the frequency domain (which an FIR filter requires)~~(edit: converting to the frequency domain isn’t necessary, thanks to the folks who pointed this out).

I’ve implemented all of this as a separate module in PulseAudio as a beamformer filter module.

Now it’s time for a confession. I’m a plumber, not a DSP ninja. My delay-sum beamformer doesn’t do a very good job. I suspect part of it is the limitation of the delay-sum approach, partly the use of an IIR filter (which the Thiran approximation is), and it’s also entirely possible there is a bug in my fractional delay implementation. Reviews and suggestions are welcome!

A Better Implementation

The astute reader has, by now, realised that we are already doing a bunch of processing on incoming audio during voice calls — I’ve written in the previous article about how the webrtc-audio-processing engine provides echo cancellation, acoustic gain control, voice activity detection, and a bunch of other features.

Another feature that the library provides is — you guessed it — beamforming. The engineers at Google (who clearly are DSP ninjas) have a pretty good beamformer implementation, and this is also available via module-echo-cancel. You do need to configure the microphone geometry yourself (which means you have to manually load the module at the moment). Details are on our wiki (thanks to Tanu for that!).

How well does this work? Let me show you. The image below is me talking to my laptop, which has two microphones about 4cm apart, on either side of the webcam, above the screen. First I move to the right of the laptop (about 60°, assuming straight ahead is 0°). Then I move to the left by about the same amount (the second speech spike). And finally I speak from the center (a couple of times, since I get distracted by my phone).

The upper section represents the microphone input — you’ll see two channels, one corresponding to each mic. The bottom part is the processed version, with echo cancellation, gain control, noise suppression, etc. and beamforming.

WebRTC beamforming

You can also listen to the actual recordings …

… and the processed output.

Feels like black magic, doesn’t it?

Finishing thoughts

The webrtc-audio-processing-based beamforming is already available for you to use. The downside is that you need to load the module manually, rather than have this automatically plugged in when needed (because we don’t have a way to store and retrieve the mic geometry). At some point, I would really like to implement a configuration framework within PulseAudio to allow users to set configuration from some external UI and have that be picked up as needed.

Nicolas Dufresne has done some work to wrap the webrtc-audio-processing library functionality in a GStreamer element (and this is in master now). Adding support for beamforming to the element would also be good to have.

The module-beamformer bits should be a good starting point for folks who want to wrap their own beamforming library and have it used in PulseAudio. Feel free to get in touch with me if you need help with that.

January 4, 2016

A Quick Update

Happy 2016 everyone!

While I did mention a while back (almost two years ago, wow) that I was taking a break, I realised recently that I hadn’t posted an update from when I started again.

For the last year and a half, I’ve been providing freelance consulting around PulseAudio, GStreamer, and various other directly and tangentially related projects. There’s a brief list of the kind of work I’ve been involved in.

If you’re looking for help with PulseAudio, GStreamer, multimedia middleware or anything else you might’ve come across on this blog, do get in touch!

October 28, 2015

PSA: Breaking webrtc-audio-processing API

I know it’s been ages, but I am now working on updating the webrtc-audio-processing library. You might remember this as the code that we split off from the webrtc.org code to use in the PulseAudio echo cancellation module.

This is basically just the AudioProcessing module, bundled as a standalone library so that we can use the fantastic AEC, AGC, and noise suppression implementation from that code base. For packaging simplicity, I made a copy of the necessary code, and wrote an autotools-based build system around that.

Now since I last copied the code, the library API has changed a bit — nothing drastic, just a few minor cleanups and removed API. This wouldn’t normally be a big deal since this code isn’t actually published as external API — it’s mostly embedded in the Chromium and Firefox trees, probably other projects too.

Since we are exposing a copy of this code as a standalone library, this means that there are two options — we could (a) just break the API, and all dependent code needs to be updated to be able to use the new version, or (b) write a small wrapper to try to maintain backwards compatibility.

I’m inclined to just break API and release a new version of the library which is not backwards compatible. My rationale for this is that I’d like to keep the code as close to what is upstream as possible, and over time it could become painful to maintain a bunch of backwards-compatibility code.

A nicer solution would be to work with upstream to make it possible to build the AudioProcessing module as a standalone library. While the folks upstream seemed amenable to the idea when this came up a few years ago, nobody has stepped up to actually do the work for this. In the mean time, a number of interesting features have been added to the module, and it would be good to pull this in to use in PulseAudio and any other projects using this code (more about this in a follow-up post).

So if you’re using webrtc-audio-processing, be warned that the next release will probably break API, and you’ll need to update your code. I’ll try to publish a quick update guide when releasing the code, but if you want to look at the current API, take a look at the current audio_processing.h.

p.s.: If you do use webrtc-audio-processing as a dependency, I’d love to hear about it. As far as I know, PulseAudio is the only user of this library at the moment.

July 23, 2014

Quick-start guide to gst-uninstalled for GStreamer 1.x

Update: gst-build is the current way to build GStreamer for developement. I’m leaving the post up for posterity, but other than the note on getting dependencies, you should not be using this.

One of the first tools that you should get if you’re hacking with GStreamer or want to play with the latest version without doing evil things to your system is probably the gst-uninstalled script. It’s the equivalent of Python’s virtualenv for hacking on GStreamer. :)

The documentation around getting this set up is a bit frugal, though, so here’s my attempt to clarify things. I was going to put this on our wiki, but that’s a bit search-engine unfriendly, so probably easiest to just keep it here. The setup I outline below can probably be automated further, and comments/suggestions are welcome.

First, get build dependencies for GStreamer core and plugins on your distribution. Commands to do this on some popular distributions follow. This will install a lot of packages, but should mean that you won’t have to play find-the-plugin-dependency for your local build.
Fedora: $ sudo yum-builddep gstreamer1-*
Debian/Ubuntu: $ sudo apt-get build-dep gstreamer1.0-plugins-{base,good,bad,ugly}
Gentoo: having the GStreamer core and plugin packages should suffice
Others: drop me a note with the command for your favourite distro, and I’ll add it here
Next, check out the code (by default, it will turn up in ~/gst/master)
$ curl https://cgit.freedesktop.org/gstreamer/gstreamer/plain/scripts/create-uninstalled-setup.sh | sh
Ignore the pointers to documentation that you see — they’re currently defunct
Now put the gst-uninstalled script somewhere you can get to it easily:
$ ln -sf ~/gst/master/gstreamer/scripts/gst-uninstalled ~/bin/gst-master
(the -master suffix for the script is important to how the script works)
Enter the uninstalled environment:
$ ~/bin/gst-master
(this puts you in the directory with all the checkouts, and sets up a bunch of environment variables to use your uninstalled setup – check with echo $GST_PLUGIN_PATH)
Time to build
$ ./gstreamer/scripts/git-update.sh
Take it out for a spin
$ gst-inspect-1.0 filesrc
$ gst-launch-1.0 playbin uri=file:///path/to/some/file
$ gst-discoverer-1.0 /path/to/some/file
That’s it! Some tips:
Remember that you need to run ~/bin/gst-master to enter the environment for each new shell
If you start up a GStreamer app from your system in this environment, it will use your uninstalled libraries and plugins
You can and should periodically update you tree by rerunning the git-update.sh script
To run gdb on gst-launch, you need to do something like:
$ libtool --mode=execute gdb --args gstreamer/tools/gst-launch-1.0 videotestsrc ! videoconvert ! xvimagesink
I find it useful to run cscope on the top-level tree, and use that for quick code browsing

Update: Fixed create-uninstalled.sh link to use https (thanks to Victor for pointing this out).

March 19, 2014

Introducing `peerflixsrc`

Some of you might have been following all the brouhaha over Popcorn Time. I won’t get into the arguments that can be made for and against at the moment.

While poking around at what it was that Popcorn Time was doing, I stumbled upon peerflix, a Node.js-based application that takes a .torrent file that points to one big video file, and presents that as an HTTP stream. It has its own BitTorrent implementation where it prioritises early chunks of the file so that it is possible to start watching the video before the entire file has been downloaded. It also seeds the file while the video is being watched locally.

Seeing as I was at the GStreamer Hackfest in Munich when this came up in discussions, it seemed topical to have a GStreamer element to wrap this neat bit of functionality. Thus was peerflixsrc born. This is a simple source element that takes a URI to a torrent file (something like torrent+http://archive.org/some/video.torrent), fires up peerflix in the background, and provides the data from the corresponding HTTP stream. Conveniently enough, this can be launched using playbin or Totem (hinting at the possibilities of what can come next!). Here’s what it looks like…

Screenshot of Totem playing a torrent file directly using peerflixsrc

The code is available now. To use it, build this copy of gst-plugins-bad using your favourite way, make sure you have peerflix installed (sudo npm install -g peerflix), and you’re good to go.

This is not quite mature enough to go into upstream GStreamer. The ugliest part is firing up a Node.js server to make this work, not the least because managing child processes on Linux is not the prettiest code you can write. Maybe someone wants to look at rewriting the torrent bits from peerflix in C? There don’t seem to be any decent C-based libraries for this out there, though.

In the mean time, enjoy this, and comments / patches welcome!

March 19, 2014

GStreamer Hackfest 2014

Last weekend, I was at the GStreamer Hackfest in Munich. As usual, it was a blast — we got much done, and it was a pleasure to meet the fine folks who bring you your favourite multimedia framework again. Thanks to the conference for providing funding to make this possible!

My plan was to work on making Totem’s support for passthrough audio work flawlessly (think allowing your A/V receiver to decode AC3/DTS if it allows it, with more complex things coming the future as we support it). We’ve had the pieces in place in GStreamer for a while now, and not having that just work with Totem has been a bit of a bummer for me.

The immediate blocker so far has been that Totem needs to add a filter (scaletempo) before the audio sink, which forces negotiation to always pick a software decoder. We solved this by adding the ability for applications to specify audio/video filters for playbin to plug in if it can. There’s a now-closed bug about it, for the curious. Hopefully, I’ll get the rest of the work to make Totem use this done soon, so things just work.

Now the reason that didn’t happen at the hackfest is that I got a bit … distracted … at the hackfest by another problem. More details in an upcoming post!

February 2, 2014

Four years

Four years and what seems like a lifetime ago, I jumped aboard the ship Collabora Multimedia, and set sail for adventure and lands unknown. We sailed through strange new seas, to exotic lands, defeated many monsters, and, I feel, had some positive impact on the world around us. Last Friday, on my request, I got dropped back at the shore.

I’ve had an insanely fun time at Collabora, working with absurdly talented and dedicated people. Nevertheless, I’ve come to the point where I feel like I need something of a break. I’m not sure what’s next, other than a month or two of rest and relaxation — reading, cycling, travel, and catching up with some of the things I’ve been promising to do if only I had more time. Yes, that includes PulseAudio and GStreamer hacking as well. :-)

And there’ll be more updates and blog posts too!

Newer Posts Older Posts