Hello again, and I hope you’re having a pleasant end of the year (if you are, maybe don’t check the news until next year).
I’d written about synchronised playback with GStreamer a little while ago, and work on that has been continuing apace. Since I last wrote about it, a bunch of work has gone in:
Landed support for sending a playlist to clients (instead of a single URI)
Added the ability to start/stop playback
The API has been cleaned up considerably to allow us to consider including this upstream
The control protocol implementation was made an interface, so you don’t have to use the built-in TCP server (different use-cases might want different transports)
Made a bunch of robustness fixes and documentation
Introduced API for clients to send the server information about themselves
Also added API for the server to send video transformations for specific clients to apply before rendering
While the other bits are exciting in their own right, in this post I’m going to talk about the last two items.
For those of you who aren’t familiar with the term, a video wall is just an array of displays stacked to make a larger display. These are often used in public installations.
One way to set up a video wall is to have each display connected to a small computer (such as the Raspberry Pi), and have them play a part of the entire video, cropped and scaled for the display that is connected. This might look something like:
The tricky part, of course, is synchronisation — which is where
gst-sync-server comes in. Since we’re able to play a given stream in sync across devices on a network, the only missing piece was the ability to distribute a set of per-client transformations so that clients could apply those, and that is now done.
In order to keep things clean from an API perspective, I took the following approach:
Clients now have the ability to send a client ID and a configuration (which is just a dictionary) when they first connect to the server
The server API emits a signal with the client ID and configuration, which allows you to know when a client connects, what kind of display it’s running, and where it is positioned
The server now has additional fields to send a map of client ID to a set of video transformations
This allows us to do fancy things like having each client manage its own information with the server dynamically adapting the set of transformations based on what is connected. Of course, the simpler case of having a static configuration on the server also works.
Since seeing is believing, here’s a demo of the synchronised playback in action:
The setup is my laptop, which has an Intel GPU, and my desktop, which has an NVidia GPU. These are connected to two monitors (thanks go out to my good friends from Uncommon for lending me their thin-bezelled displays).
The video resolution is 1920×800, and I’ve adjusted the crop parameters to account for the bezels, so the video actually does look continuous. I’ve uploaded the text configuration if you’re curious about what that looks like.
As I mention in the video, the synchronisation is not as tight than I would like it to be. This is most likely because of the differing device configurations. I’ve been working with Nicolas to try to address this shortcoming by using some timing extensions that the Wayland protocol allows for. More news on this as it breaks.
More generally, I’ve done some work to quantify the degree of sync, but I’m going to leave that for another day.
p.s. the reason I used
kmssink in the demo was that it was the quickest way I know of to get a full-screen video going — I’m happy to hear about alternatives, though
Make it real
My demo was implemented quite quickly by allowing the example server code to load and serve up a static configuration. What I would like is to have a proper working application that people can easily package and deploy on the kinds of embedded systems used in real video walls. If you’re interested in taking this up, I’d be happy to help out. Bonus points if we can dynamically calculate transformations based on client configuration (position, display size, bezel size, etc.)
One thing that’s bothering me is that the video transformations are applied in software using GStreamer elements. This works fine(ish) for the hardware I’m developing on, but in real life, we would want to use OpenGL(ES) transformations, or platform specific elements to have hardware-accelerated transformations. My initial thoughts are for this to be either API on
playbin or a
GstBin that takes a set of transformations as parameters and internally sets up the best method to do this based on whatever sink is available downstream (some sinks provide cropping and other transformations).
Why not audio?
I’ve only written about video transformations here, but we can do the same with audio transformations too. For example, multi-room audio systems allow you to configure the locations of wireless speakers — so you can set which one’s on the left, and which on the right — and the speaker will automatically play the appropriate channel. Implementing this should be quite easy with the infrastructure that’s currently in place.
I hope you enjoyed reading that — I’ve had great responses from a lot of people about how they might be able to use this work. If there’s something you’d like to see, leave a comment or file an issue.
Happy end of the year, and all the best for 2017!
December 31, 2016 — 12:02 am
Does it handle things OK if one of you do something like reboot one of the clients while the videos are showing – can it rejoin automatically, and start playing the video at the right part?
December 31, 2016 — 12:38 am
Oh yes, having clients be able to join at any point has been part of the design from the start, so when a new client joins, it’ll jump to the right media in the playlist and seek forward to start from the right point too.
December 31, 2016 — 1:42 am
Great work! Keep it up!
I would like to have some freetime to use this somewhere around my house or something because its so cool =)
December 31, 2016 — 8:43 am
Thanks :D Looking forward to hearing about what you end up building!
January 3, 2017 — 2:54 pm
Did you think of using https://github.com/UbiCastTeam/qr-lipsync ? Just generate a 30 or 60 fps sample, and display it on all your synced displays, and film the two screens together with e.g. your phone (ideally in high framerate); then analyze the video captured on your phone and each qrcode will contain the actual timestamp and framecount for you to compare. The code above is not designed to compare multiple qrcodes but to check against audio, but that should be a piece of cake for you :)
January 4, 2017 — 2:03 pm
Very neat! I’ve been using a oscilloscope to do this with a tick waverform.