GStreamer and Synchronisation Made Easy

A lesser known, but particularly powerful feature of GStreamer is our ability to play media synchronised across devices with fairly good accuracy.

The way things stand right now, though, achieving this requires some amount of fiddling and a reasonably thorough knowledge of how GStreamer’s synchronisation mechanisms work. While we have had some excellent talks about these at previous GStreamer conferences, getting things to work is still a fair amount of effort for someone not well-versed with GStreamer.

As part of my work with the Samsung OSG, I’ve been working on addressing this problem, by wrapping all the complexity in a library. The intention is that anybody who wants to implement the ability for different devices on a network to play the same stream and have them all synchronised should be able to do so with a few lines of code, and the basic know-how for writing GStreamer-based applications.

I’ve started work on this already, and you can find the code in the creatively named gst-sync-server repo.

Design and API

Let’s make this easier by starting with a picture …

Big picture of the architecture

Let’s say you’re writing a simple application where you have two ore more devices that need to play the same video stream, in sync. Your system would consist of two entities:

  • A server: this is where you configure what needs to be played. It instantiates a GstSyncServer object on which it can set a URI that needs to be played. There are other controls available here that I’ll get to in a moment.

  • A client: each device would be running a copy of the client, and would get information from the server telling it what to play, and what clock to use to make sure playback is synchronised. In practical terms, you do this by creating a GstSyncClient object, and giving it a playbin element which you’ve configured appropriately (this usually involves at least setting the appropriate video sink that integrates with your UI).

That’s pretty much it. Your application instantiates these two objects, starts them up, and as long as the clients can access the media URI, you magically have two synchronised streams on your devices.


The keen observers among you would have noticed that there is a control entity in the above diagram that deals with communicating information from the server to clients over the network. While I have currently implemented a simple TCP protocol for this, my goal is to abstract out the control transport interface so that it is easy to drop in a custom transport (Websockets, a REST API, whatever).

The actual sync information is merely a structure marshalled into a JSON string and sent to clients every time something happens. Once your application has some media playing, the next thing you’ll want to do from your server is control playback. This can include

  • Changing what media is playing (like after the current media ends)
  • Pausing/resuming the media
  • Seeking
  • “Trick modes” such as fast forward or reverse playback

The first two of these work already, and seeking is on my short-term to-do list. Trick modes, as the name suggets, can be a bit more tricky, so I’ll likely get to them after other things are done.

Getting fancy

My hope is to see this library being used in a few other interesting use cases:

  • Video walls: having a number of displays stacked together so you have one giant display — these are all effectively playing different rectangles from the same video

  • Multiroom audio: you can play the same music across different speakers in a single room, or multiple rooms, or even group sets of speakers and play different media on different groups

  • Media sharing: being able to play music or videos on your phone and have your friends be able to listen/watch at the same time (a silent disco app?)

What next

At this point, the outline of what I think the API should look like is done. I still need to create the transport abstraction, but that’s pretty much a matter of extracting out the properties and signals that are part of the existing TCP transport.

What I would like is to hear from you, my dear readers who are interested in using this library — does the API look like it would work for you? Does the transport mechanism I describe above cover what you might need? There is example code that should make it easier to understand how this library is meant to be used.

Depending on the feedback I get, my next steps will be to implement the transport interface, refine the API a bit, fix a bunch of FIXMEs, and then see if this is something we can include in gst-plugins-bad.

Feel free to comment either on the Github repository, on this blog, or via email.

And don’t forget to watch this space for some videos and measurements of how GStreamer synchronised fares in real life!


Add yours →

  1. Great! Rygel patches welcome! \o/ :D

  2. I’m building a videowall right now that could have used this.

    Some things to solve: handle disconnections and reconnection gracefully (and handle gstreamer clock – not something I’ve looked into yet).

    Some sort of playlist API

    • since it might take 150ms or more to get across Wi-Fi, you need to know what to play next before the current item has finished playing.

    • For images you need to know how long to display them for, since they eos immediately.

    Transformations – for a videowall each screen can display a part of a larger virtual image.

    • So the first two things I’m looking at after some API cleanups (that are more or less done now) are two things you mention — playlists, and transformations.

      Images should be relatively straight-forward to add on top of this (I’ll add it to the TODO).

      Connection/disconnection is handled gracefully — a client can turn up at any time and be in sync. The server deals with clients going away fine too. I also want to be able to deal with dynamically updating client transformations based on this (which I intend to have from the get-go while implementing transformations).

      Not sure what you’re referring to with regards to the GStreamer clock.

  3. Let’s say audio and video come from different servers, but both streams should be played in sync.

    Have you considered syncing playback in a setup with multiple servers and multiple clients?

  4. How accurate and fast you are gonna to be?

    In my case, i have amount of devices (>10) on a field. Some of them are wired. Other are wireless. They have different meaning: video source, audio, information providers, event registrations. All should be synchronized in a milliseconds.

    Will you handle time sync, delays, timeouts, watch dog, wrong packet order,….

    If it is so, we can to start work together on this library. Or it is not your scope?

    • The framework I’m building should allow you to do all this — I’ve seen sub 5 ms (worst case) sync on pretty terrible hardware over wifi, so the degree of synchronisation should be easy enough.

      My plan to do multi-room is make it generic, so that you can have client groups, and each group can have its own stream. This should fit what you need for clients to be able to play different things.

      Delay, timeouts, watchdog, packet order etc. are dependent on the transport, which is pretty much a plug-what-you-want mechanism, with a default TCP transport which at least takes care of packet order.

      Contributions are welcome, of course!

Leave a Reply