Tag: f/oss

To Conference Organisers Everywhere…

(well, not exactly everywhere …)

This is not an easy post for me to write, being a bit of a criticism / “you can do better” note for organisers of conferences that cater to a global community.

It’s not easy because most of the conferences I attend are community driven, and I have helped organise community conferences in the past. It is a thankless job, a labour of love, and you mostly do not get to enjoy the fruits of that labour as others do.

The problem is that these conferences end up inadvertently excluding members who live in, for lack of a better phrase, the Global South.

Visas

It always surprises me when I meet someone who doesn’t realise that I can’t just book tickets to go anywhere in the world. Not because this is information that everyone should be aware of, but because this is such a basic aspect of travel for someone like me. As a holder of an Indian passport, I need to apply for a visa to travel to … well most countries.

The list of countries that require a visa are clearly defined by post-colonial geopolitics, this is just a fact of life and not something I can do very much about.

Getting a Visa

Applying for a visa is a cumbersome and intrusive process that I am now used to. The process varies from country to country, but it’s usually something like:

  • Get an invitation letter from conference organisers
  • Book all the tickets and accommodation for the trip
  • Provide bank statements for 3-6 months, income tax returns for ~3 years (in India, those statements need attestation by the bank)
  • Maybe provide travel and employment history for the past few years (how many years depends on the country)
  • Get an appointment from the embassy of the country you’re traveling to (or their service provider)
  • Submit your passport and application
  • Maybe provide documentation that was not listed on the provider’s website
  • Wait for your passport with visa (if granted) to be mailed back to you

The duration of visa (that is how long you can stay in the country) depends on the country.

In the EU, I am usually granted a visa for the exact dates of travel (so there is no flexibility to change plans). The UK allows you to pay more for a longer visa.

The US and Canada grant multi-year visas that allow one to visit for up to 6 months by default (in the US, whether you are permitted to enter and how long you may stay are determined by the person at the border).

Timelines

Now we get to the crux of the problem: this process can take anywhere from a few days (if you are very lucky) to a few months (if you are not).

Appointments are granted by the embassy or the third party that countries delegate application collection to, and these may or may not be easily available. Post-pandemic, I’ve seen that several embassies just aren’t accepting visitor visa appointments or have a multi-month wait.

If you do get an appointment, the processing time can vary again. Sometimes, it’s a matter of a few days, sometimes a few weeks. A lot of countries I have applied to recommend submitting your application at least 6 weeks in advance (this is from the date of your visa appointment which might be several weeks in the future).

Conference Schedules

If you’re organising a conference, there are a few important dates:

  • When the conference dates are announced
  • When the call for participation goes out
  • When it ends
  • When speakers are notified
  • The conference itself

These dates are based on a set of complex factors — venue availability and confirmation, literally writing and publishing all the content of the website, paper committee availability, etc.

But if you’re in my position, you need at least 2-3 months between the first and the last step. If your attendance is conditional on speaking at the conference (for example, if your company will only sponsor you if you’re speaking), then you need a minimum of 2-3 months between when speakers are notified and the conference starts.

From what I see, this is not something that is top-of-mind for conference organisers. That may happen for a host of perfectly understandable reasons, but it also has a cost to the community and individuals who might want to participate.

Other Costs

Applying for a visa costs money. This can be anything from a few hundred to over a 1000 US dollars.

It also costs you time — filling in the application, getting all the documentation in place, getting a physical visa photo (must be no older than 6 months), traveling to an appointment, waiting in line, etc. This can easily be a matter of a day if not more.

Finally, there is an emotional cost to all this — there is constant uncertainty during the process, and a visa rejection means every visa you apply for thereafter needs you to document that rejection and reason. And you may find out just days before your planned travel whether you get to travel or not.

What Can One Do?

All of this clearly sucks, but the problem of visas is too big and messy for any of us to have any real impact on, at least in the short term. But if you’re organising a conference, and you want a diverse audience, here are a few things you can do:

  • Announce the dates of the conference as early as possible (allows participants to book travel, visa appointments, maybe club multiple conferences)
  • Provide invitation letters in a timely manner
  • Call for participation as early as possible
  • Notify speakers as soon as you can

I know of conferences that do some if not all of these things — you know who you are and you have my gratitude for it.

If you made it this far, thank you for reading.

GStreamer for your backend services

For the last year and a half, we at Asymptotic have been working with the excellent team at Daily. I’d like to share a little bit about what we’ve learned.

Daily is a real time calling platform as a service. One standard feature that users have come to expect in their calls is the ability to record them, or to stream their conversations to a larger audience. This involves mixing together all the audio/video from each participant and then storing it, or streaming it live via YouTube, Twitch, or any other third-party service.

As you might expect, GStreamer is a good fit for building this kind of functionality, where we consume a bunch of RTP streams, composite/mix them, and then send them out to one or more external services (Amazon’s S3 for recordings and HLS, or a third-party RTMP server).

I’ve written about how we implemented this feature elsewhere, but I’ll summarise briefly.

This is a slightly longer post than usual, so grab a cup of your favourite beverage, or jump straight to the summary section for the tl;dr.

Read More

Update from the PipeWire hackfest

As the third and final day of the PipeWire hackfest draws to a close, I thought I’d summarise some of my thoughts on the goings-on and the future.

Thanks

Before I get into the details, I want to send out a big thank you to:

  • Christian Schaller for all the hard work of organising the event and Wim Taymans for the work on PipeWire so far (and in the future)
  • The GNOME Foundation, for sponsoring the event as a whole
  • Qualcomm, who are funding my presence at the event
  • Collabora, for sponsoring dinner on Monday
  • Everybody who attended and participate for their time and thoughtful comments

Background

For those of you who are not familiar with it, PipeWire (previously Pinos, previously PulseVideo) was Wim’s effort at providing secure, multi-program access to video devices (like webcams, or the desktop for screen capture). As he went down that rabbit hole, he wrote SPA, a lightweight general-purpose framework for representing a streaming graph, and this led to the idea of expanding the project to include support for low latency audio.

The Linux userspace audio story has, for the longest time, consisted of two top-level components: PulseAudio which handles consumer audio (power efficiency, wide range of arbitrary hardware), and JACK which deals with pro audio (low latency, high performance). Consolidating this into a good out-of-the-box experience for all use-cases has been a long-standing goal for myself and others in the community that I have spoken to.

An Opportunity

From a PulseAudio perspective, it has been hard to achieve the 1-to-few millisecond latency numbers that would be absolutely necessary for professional audio use-cases. A lot of work has gone into improving this situation, most recently with David Henningsson’s shared-ringbuffer channels that made client/server communication more efficient.

At the same time, as application sandboxing frameworks such as Flatpak have added security requirements of us that were not accounted for when PulseAudio was written. Examples including choosing which devices an application has access to (or can even know of) or which applications can act as control entities (set routing etc., enable/disable devices). Some work has gone into this — Ahmed Darwish did some key work to get memfd support in PulseAudio, and Wim has prototyped an access-control mechanism module to enable a Flatpak portal for sound.

All this said, there are still fundamental limitations in architectural decisions in PulseAudio that would require significant plumbing to address. With Wim’s work on PipeWire and his extensive background with GStreamer and PulseAudio itself, I think we have an opportunity to revisit some of those decisions with the benefit of a decade’s worth of learning deploying PulseAudio in various domains starting from desktops/laptops to phones, cars, robots, home audio, telephony systems and a lot more.

Key Ideas

There are some core ideas of PipeWire that I am quite excited about.

The first of these is the graph. Like JACK, the entities that participate in the data flow are represented by PipeWire as nodes in a graph, and routing between nodes is very flexible — you can route applications to playback devices and capture devices to applications, but you can also route applications to other applications, and this is notionally the same thing.

The second idea is a bit more radical — PipeWire itself only “runs” the graph. The actual connections between nodes are created and managed by a “session manager”. This allows us to completely separate the data flow from policy, which means we could write completely separate policy for desktop use cases vs. specific embedded use cases. I’m particularly excited to see this be scriptable in a higher-level language, which is something Bastien has already started work on!

A powerful idea in PulseAudio was rewinding — the ability to send out huge buffers to the device, but the flexibility to rewind that data when things changed (a new stream got added, or the stream moved, or the volume changed). While this is great for power saving, it is a significant amount of complexity in the code. In addition, with some filters in the data path, rewinding can break the algorithm by introducing non-linearity. PipeWire doesn’t support rewinds, and we will need to find a good way to manage latencies to account for low power use cases. One example is that we could have the session manager bump up the device latency when we know latency doesn’t matter (Android does this when the screen is off).

There are a bunch of other things that are in the process of being fleshed out, like being able to represent the hardware as a graph as well, to have a clearer idea of what is going on within a node. More updates as these things are more concrete.

The Way Forward

There is a good summary by Christian about our discussion about what is missing and how we can go about trying to make a smooth transition for PulseAudio users. There is, of course, a lot to do, and my ideal outcome is that we one day flip a switch and nobody knows that we have done so.

In practice, we’ll need to figure out how to make this transition seamless for most people, while folks with custom setup will need to be given a long runway and clear documentation to know what to do. It’s way to early to talk about this in more specifics, however.

Configuration

One key thing that PulseAudio does right (I know there are people who disagree!) is having a custom configuration that automagically works on a lot of Intel HDA-based systems. We’ve been wondering how to deal with this in PipeWire, and the path we think makes sense is to transition to ALSA UCM configuration. This is not as flexible as we need it to be, but I’d like to extend it for that purpose if possible. This would ideally also help consolidate the various methods of configuration being used by the various Linux userspaces.

To that end, I’ve started trying to get a UCM setup on my desktop that PulseAudio can use, and be functionally equivalent to what we do with our existing configuration. There are missing bits and bobs, and I’m currently focusing on the ones related to hardware volume control. I’ll write about this in the future as the effort expands out to other hardware.

Onwards and upwards

The transition to PipeWire is unlikely to be quick or completely-painless or free of contention. For those who are worried about the future, know that any switch is still a long way away. In the mean time, however, constructive feedback and comments are welcome.

Applicative Functors for Fun and Parsing

PSA: This post has a bunch of Haskell code, but I’m going to try to make it more broadly accessible. Let’s see how that goes.

I’ve been proceeding apace with my 3rd year in Abhinav’s Haskell classes at Nilenso, and we just got done with the section on Applicative Functors. I’m at that point when I finally “get” it, so I thought I’d document the process, and maybe capture my a-ha moment of Applicatives.

I should point out that the ideas and approach in this post are all based on Abhinav’s class material (and I’ve found them really effective in understanding the underlying concepts). Many thanks are due to him, and any lack of clarity you find ahead is in my own understanding.

Functors and Applicatives

Functors represent a type or a context on which we can meaningfully apply (map) a function. The Functor typeclass is pretty straightforward:

Easy enough. fmap takes a function that transforms something of type a to type b and a value of type a in a context f. It produces a value of type b in the same context.

The Applicative typeclass adds two things to Functor. Firstly, it gives us a means of putting things inside a context (also called lifting). The second is to apply a function within a context.

We can see pure lifts a given value into a context. The apply function (<*>) intuitively looks like fmap, with the difference that the function is within a context. This becomes key when we remember that Haskell functions are curried (and can thus be partially applied). This would then allow us to write something like:

This function takes two numbers in the Maybe context (that is, they either exist, or are Nothing), and adds them. The result will be the sum if both numbers exist, or Nothing if either or both do not.

Go ahead and convince yourself that it is painful to express this generically with just fmap.

Parsers

There are many ways of looking at what a parser is. Let’s work with one definition: A parser,

  • Takes some input
  • Converts some or all of it into something else if it can
  • Returns whatever input was not used in the conversion

How do we represent something that converts something to something else? It’s a function, of course. Let’s write that down as a type:

This more or less directly maps to what we just said. A Parser is a data type which has two type parameters — an input type and an output type. It contains a function that takes one argument of the input type, and produces a tuple of Maybe the output type (signifying if parsing succeeded) and the rest of the input.

We can name the field runParser, so it becomes easier to get a hold of the function inside our Parser type:

Parser combinators

The “rest” part is important for the reason that we would like to be able to chain small parsers together to make bigger parsers. We do this using “parser combinators” — functions that take one or more parsers and return a more complex parser formed by combining them in some way. We’ll see some of those ways as we go along.

Parser instances

Before we proceed, let’s define Functor and Applicative instances for our Parser type.

The intuition here is clear — if I have a parser that takes some input and provides some output, fmaping a function on that parser translates to applying that function on the output of the parser.

The Applicative instance is a bit more involved than Functor. What we’re doing first is “running” the first parser which gives us the function we want to apply (remember that this is a curried function, so rather than parsing out a function, we are most likely parsing out a value and creating a function with that). If we succeed, then we run the second parser to get a value to apply the function to. If this is also successful, we apply the function to the value, and return the result within the parser context (i.e. the result, and the rest of the input).

Implementing some parsers

Now let’s take our new data type and instances for a spin. Before we write a real parser, let’s write a helper function. A common theme while parsing a string is to match a single character on a predicate — for example, “is this character an alphabet”, or “is this character a semi-colon”. We write a function to take a predicate and return the corresponding parser:

Now let’s try to make a parser that takes a string, and if it finds a ASCII digit character, provides the corresponding integer value. We have a function from the Data.Char module to match ASCII digit characters — isDigit. We also have a function to take a digit character and give us an integer — digitToInt. Putting this together with satisfy above.

And that’s it! Note how we used our higher-order satisfy function to match a ASCII digit character and the Functor instance to apply digitToInt to the result of that parser (reminder: <$> is just the infix form of writing fmap — this is the same as fmap digitToInt (satisfy digit).

Another example — a character parser, which succeeds if the next character in the input is a specific character we choose.

Once again, the satisfy function makes this a breeze. I must say I’m pleased with the conciseness of this.

Finally, let’s combine character parsers to create a word parser — a parser that succeeds if the input is a given word.

A match on an empty word always succeeds. For anything else, we just break down the parser to a character parser of the first character and a recursive call to the word parser for the rest. Again, note the use of the Functor and Applicative instance. Let’s look at the type signature of the (:) (list cons) function, which prepends an element to a list:

The function takes two arguments — a single element of type a, and a list of elements of type a. If we expand the types some more, we’ll see that the first argument we give it is a Parser String Char and the second is a Parser String [Char] (String is just an alias for [Char]).

In this way we are able to take the basic list prepend function and use it to construct a list of characters within the Parser context. (a-ha!?)

JSON

JSON is a relatively simple format to parse, and makes for a good example for building a parser. The JSON website has a couple of good depictions of the JSON language grammar front and center.

So that defines our parser problem then — we want to read a string input, and convert it into some sort of in-memory representation of the JSON value. Let’s see what that would look like in Haskell.

The JSON specification does not really tell us what type to use for numbers. We could just use a Double, but to make things interesting, we represent it as an arbitrary precision floating point number.

Note that the JsonArray and JsonObject constructors are recursive, as they should be — a JSON array is an array of JSON values, and a JSON object is a mapping from string keys to JSON values.

Parsing JSON

We now have the pieces we need to start parsing JSON. Let’s start with the easy bits.

null

To parse a null we literally just look for the word “null”.

The $> operator is a flipped shortcut for fmap . const — it evaluates the argument on the left, and then fmaps the argument on the right onto it. If the word "null" parser is successful (Just "null"), we’ll fmap the JsonValue representing null to replace the string "null" (i.e. we’ll get a (Just JsonNull, <rest of the input>)).

true and false

First a quick detour:

The Alternative instance is easy to follow once you understand Applicative. We define an empty parser that matches nothing. Then we define the alternative operator (<|>) as we might intuitively imagine.

We run the parser given as the first argument first, if it succeeds we are done. If it fails, we run the second parser on the whole input again, if it succeeds, we return that value. If both fail, we return Nothing.

Parsing true and false with this in our belt looks like:

We are easily able express the idea of trying to parse for the string “true”, and if that fails, trying again for the string “false”. If either matches, we have a boolean value, if not, Nothing. Again, nice and concise.

String

This is only slightly more complex. We need a couple of helper functions first:

hexDigit is easy to follow. It just matches anything from 0-9 and a-f or A-F.

digitsToNumber is a pure function that takes a list of digits, and interprets it as a number in the given base. We do some jumping through hoops with fromIntegral to take Int digits (mapping to a normal word-sized integer) and produce an Integer (arbitrary sized integer).

Now follow along one line at a time:

A string is a valid JSON character, surrounded by quotes. The *> and <* operators allow us to chain parsers whose output we wish to discard (since the quotes are not part of the actual string itself). The many function comes from the Alternative typeclass. It represents zero or more instances of context. In our case, it tries to match zero or more jsonChar parsers.

So what does jsonChar do? Following the definition of a character in the JSON spec, first we try to match something that is not a quote ("), a backslash (\) or a control character. If that doesn’t match, we try to match the various escape characters that the specification mentions.

Finally, if we get a \u followed by 4 hexadecimal characters, we put them in a list (replicateM 4 hexDigit chains 4 hexDigit parsers and provides the output as a list), convert that list into a base 16 integer (digitsToNumber), and then convert that to a Unicode character (chr).

The order of chaining these parsers does matter for performance. The first parser in our <|> chain is the one that is most likely (most characters are not escaped). This follows from our definition of the Alternative instance. We run the first parser, then the second, and so on. We want this to succeed as early as possible so we don’t run more parsers than necessary.

Arrays

Arrays and objects have something in common — they have items which are separated by some value (commas for array values, commas for each key-value pair in an object, and colons separating keys and values). Let’s just factor this commonality out:

We take a parser for our values (v), and a parser for our separator (s). We try to parse one or more v separated by s, and or just return an empty list in the parser context if there are none.

Now we write our JSON array parser as:

Nice, that’s really succinct. But wait! What is json?

Putting it all together

We know that arrays contain JSON values. And we know how to parse some JSON values. Let’s try to put those together for our recursive definition:

And that’s it!

The JSON object and number parsers follow the same pattern. So far we’ve ignored spaces in the input, but those can be consumed and ignored easily enough based on what we’ve learned.

You can find the complete code for this exercise on Github.

Some examples of what this looks like in the REPL:

Concluding thoughts

If you’ve made it this far, thank you! I realise this is long and somewhat dense, but I am very excited by how elegantly Haskell allows us to express these ideas, using fundamental aspects of its type(class) system.

A nice real world example of how you might use this is the optparse-applicative package which uses these ideas to greatly simplify the otherwise dreary task of parsing command line arguments.

I hope this post generates at least some of the excitement in you that it has in me. Feel free to leave your comments and thoughts below.

A Late GUADEC 2017 Post

It’s been a little over a month since I got back from Manchester, and this post should’ve come out earlier but I’ve been swamped.

The conference was absolutely lovely, the organisation was a 110% on point (serious kudos, I know first hand how hard that is). Others on Planet GNOME have written extensively about the talks, the social events, and everything in between that made it a great experience. What I would like to write about is about why this year’s GUADEC was special to me.

GNOME turning 20 years old is obviously a large milestone, and one of the main reasons I wanted to make sure I was at Manchester this year. There were many occasions to take stock of how far we had come, where we are, and most importantly, to reaffirm who we are, and why we do what we do.

And all of this made me think of my own history with GNOME. In 2002/2003, Nat and Miguel came down to Bangalore to talk about some of the work they were doing. I know I wasn’t the only one who found their energy infectious, and at Linux Bangalore 2003, they got on stage, just sat down, and started hacking up a GtkMozEmbed-based browser. The idea itself was fun, but what I took away — and I know I wasn’t the only one — is the sheer inclusive joy they shared in creating something and sharing that with their audience.

For all of us working on GNOME in whatever way we choose to contribute, there is the immediate gratification of shaping this project, as well as the larger ideological underpinning of making everyone’s experience talking to their computers better and free-er.

But I think it is also important to remember that all our efforts to make our community an inviting and inclusive space have a deep impact across the world. So much so that complete strangers from around the world are able to feel a sense of belonging to something much larger than themselves.

I am excited about everything we will achieve in the next 20 years.

(thanks go out to the GNOME Foundation for helping me attend GUADEC this year)

Sponsored by GNOME!

Stricter JSON parsing with Haskell and Aeson

I’ve been having fun recently, writing a RESTful service using Haskell and Servant. I did run into a problem that I couldn’t easily find a solution to on the magical bounty of knowledge that is the Internet, so I thought I’d share my findings and solution.

While writing this service (and practically any Haskell code), step 1 is of course defining our core types. Our REST endpoint is basically a CRUD app which exchanges these with the outside world as JSON objects. Doing this is delightfully simple:

That’s all it takes to get the basic type up with free serialization using Aeson and Haskell Generics. This is followed by a few more lines to hook up GET and POST handlers, we instantiate the server using warp, and we’re good to go. All standard stuff, right out of the Servant tutorial.

The POST request accepts a new object in the form of a JSON object, which is then used to create the corresponding object on the server. Standard operating procedure again, as far as RESTful APIs go.

The nice part about doing it like this is that the input is automatically validated based on types. So input like:

will result in:

Error in $: expected String, encountered Number

However, as this nice tour of how Aeson works demonstrate, if the input has keys that we don’t recognise, no error will be raised:

This behaviour would not be undesirable in use-cases such as mine — if the client is sending fields we don’t understand, I’d like for the server to signal an error so the underlying problem can be caught early.

As it turns out, making the JSON parsing stricter and catch missing fields is just a little more involved. I didn’t find how this could be done in a single place on the Internet, so here’s the best I could do:

The idea is quite straightforward, and likely very easy to make generic. The Data.Data module lets us extract the constructor for the Job type, and the list of fields in that constructor. We just make sure that’s an exact match for the list of keys in the JSON object we parsed, and that’s it.

Of course, I’m quite new to the Haskell world so it’s likely there are better ways to do this. Feel free to drop a comment with suggestions! In the mean time, maybe this will be useful to others facing a similar problem.

Update: I’ve fixed parseJSON to properly use fieldLabelModifier from the default options, so that comparison actually works when you’re not using Aeson‘s default options. Thanks to /u/tathougies for catching that.

I’m also hoping to rewrite this in generic form using Generics, so watch this space for more updates.

Quantifying Synchronisation: Oscilloscope Edition

I’ve written a bit in my last two blog posts about the work I’ve been doing in inter-device synchronised playback using GStreamer. I introduced the library and then demonstrated its use in building video walls.

The important thing in synchronisation, of course, is how much in-sync are the streams? The video in my previous post gave a glimpse into that, and in this post I’ll expand on that with a more rigorous, quantifiable approach.

Before I start, a quick note: I am currently providing freelance consulting around GStreamer, PulseAudio and open source multimedia in general. If you’re looking for help with any of these, do get in touch.

The sync measurement setup

Quantifying what?

What is it that we are trying to measure? Let’s look at this in terms of the outcome — I have two computers, on a network. Using the gst-sync-server library, I play a stream on both of them. The ideal outcome is that the same video frame is displayed at exactly the same time, and the audio sample being played out of the respective speakers is also identical at any given instant.

As we saw previously, the video output is not a good way to measure what we want. This is because video displays are updated in sync with the display clock, over which consumer hardware generally does not have control. Besides, our eyes are not that sensitive to minor differences in timing unless images are side-by-side. After all, we’re fooling it with static pictures that change every 16.67ms or so.

Using audio, though, we should be able to do better. Digital audio streams for music/videos typically consist of 44100 or 48000 samples a second, so we have a much finer granularity than video provides us. The human ear is also fairly sensitive to timings with regards to sound. If it hears the same sound at an interval larger than 10 ms, you will hear two distinct sounds and the echo will annoy you to no end.

Measuring audio is also good enough because once you’ve got audio in sync, GStreamer will take care of A/V sync itself.

Setup

Okay, so now that we know what we want to measure, but how do we measure it? The setup is illustrated below:

Sync measurement setup illustrated

As before, I’ve set up my desktop PC and laptop to play the same stream in sync. The stream being played is a local audio file — I’m keeping the setup simple by not adding network streaming to the equation.

The audio itself is just a tick sound every second. The tick is a simple 440 Hz sine wave (A₄ for the musically inclined) that runs for for 1600 samples. It sounds something like this:

I’ve connected the 3.5mm audio output of both the computers to my faithful digital oscilloscope (a Tektronix TBS 1072B if you wanted to know). So now measuring synchronisation is really a question of seeing how far apart the leading edge of the sine wave on the tick is.

Of course, this assumes we’re not more than 1s out of sync (that’s the periodicity of the tick itself), and I’ve verified that by playing non-periodic sounds (any song or video) and making sure they’re in sync as well. You can trust me on this, or better yet, get the code and try it yourself! :)

The last piece to worry about — the network. How well we can sync the two streams depends on how well we can synchronise the clocks of the pipeline we’re running on each of the two devices. I’ll talk about how this works in a subsequent post, but my measurements are done on both a wired and wireless network.

Measurements

Before we get into it, we should keep in mind that due to how we synchronise streams — using a network clock — how in-sync our streams are will vary over time depending on the quality of the network connection.

If this variation is small enough, it won’t be noticeable. If it is large (10s of milliseconds), then we may notice start to notice it as echo, or glitches when the pipeline tries to correct for the lack of sync.

In the first setup, my laptop and desktop are connected to each other directly via a LAN cable. The result looks something like this:

The first two images show the best case — we need to zoom in real close to see how out of sync the audio is, and it’s roughly 50µs.

The next two images show the “worst case”. This time, the zoomed out (5ms) version shows some out-of-sync-ness, and on zooming in, we see that it’s in the order of 500µs.

So even our bad case is actually quite good — sound travels at about 340 m/s, so 500µs is the equivalent of two speakers about 17cm apart.

Now let’s make things a little more interesting. With both my laptop and desktop connected to a wifi network:

On average, the sync can be quite okay. The first pair of images show sync to be within about 300µs.

However, the wifi on my desktop is flaky, so you can see it go off up to 2.5ms in the next pair. In my setup, it even goes off up to 10-20ms, before returning to the average case. The next two images show it go back and forth.

Why does this happen? Well, let’s take a quick look at what ping statistics from my desktop to my laptop look like:

Ping from desktop to laptop on wifi

That’s not good — you can see that the minimum, average and maximum RTT are very different. Our network clock logic probably needs some tuning to deal with this much jitter.

Conclusion

These measurements show that we can get some (in my opinion) pretty good synchronisation between devices using GStreamer. I wrote the gst-sync-server library to make it easy to build applications on top of this feature.

The obvious area to improve is how we cope with jittery networks. We’ve added some infrastructure to capture and replay clock synchronisation messages offline. What remains is to build a large enough body of good and bad cases, and then tune the sync algorithm to work as well as possible with all of these.

Also, Florent over at Ubicast pointed out a nice tool they’ve written to measure A/V sync on the same device. It would be interesting to modify this to allow for automated measurement of inter-device sync.

In a future post, I’ll write more about how we actually achieve synchronisation between devices, and how we can go about improving it.

Synchronised Playback and Video Walls

Hello again, and I hope you’re having a pleasant end of the year (if you are, maybe don’t check the news until next year).

I’d written about synchronised playback with GStreamer a little while ago, and work on that has been continuing apace. Since I last wrote about it, a bunch of work has gone in:

  • Landed support for sending a playlist to clients (instead of a single URI)

  • Added the ability to start/stop playback

  • The API has been cleaned up considerably to allow us to consider including this upstream

  • The control protocol implementation was made an interface, so you don’t have to use the built-in TCP server (different use-cases might want different transports)

  • Made a bunch of robustness fixes and documentation

  • Introduced API for clients to send the server information about themselves

  • Also added API for the server to send video transformations for specific clients to apply before rendering

While the other bits are exciting in their own right, in this post I’m going to talk about the last two items.

Video walls

For those of you who aren’t familiar with the term, a video wall is just an array of displays stacked to make a larger display. These are often used in public installations.

One way to set up a video wall is to have each display connected to a small computer (such as the Raspberry Pi), and have them play a part of the entire video, cropped and scaled for the display that is connected. This might look something like:

A 4×4 video wall

The tricky part, of course, is synchronisation — which is where gst-sync-server comes in. Since we’re able to play a given stream in sync across devices on a network, the only missing piece was the ability to distribute a set of per-client transformations so that clients could apply those, and that is now done.

In order to keep things clean from an API perspective, I took the following approach:

  • Clients now have the ability to send a client ID and a configuration (which is just a dictionary) when they first connect to the server

  • The server API emits a signal with the client ID and configuration, which allows you to know when a client connects, what kind of display it’s running, and where it is positioned

  • The server now has additional fields to send a map of client ID to a set of video transformations

This allows us to do fancy things like having each client manage its own information with the server dynamically adapting the set of transformations based on what is connected. Of course, the simpler case of having a static configuration on the server also works.

Demo

Since seeing is believing, here’s a demo of the synchronised playback in action:

The setup is my laptop, which has an Intel GPU, and my desktop, which has an NVidia GPU. These are connected to two monitors (thanks go out to my good friends from Uncommon for lending me their thin-bezelled displays).

The video resolution is 1920×800, and I’ve adjusted the crop parameters to account for the bezels, so the video actually does look continuous. I’ve uploaded the text configuration if you’re curious about what that looks like.

As I mention in the video, the synchronisation is not as tight than I would like it to be. This is most likely because of the differing device configurations. I’ve been working with Nicolas to try to address this shortcoming by using some timing extensions that the Wayland protocol allows for. More news on this as it breaks.

More generally, I’ve done some work to quantify the degree of sync, but I’m going to leave that for another day.

p.s. the reason I used kmssink in the demo was that it was the quickest way I know of to get a full-screen video going — I’m happy to hear about alternatives, though

Future work

Make it real

My demo was implemented quite quickly by allowing the example server code to load and serve up a static configuration. What I would like is to have a proper working application that people can easily package and deploy on the kinds of embedded systems used in real video walls. If you’re interested in taking this up, I’d be happy to help out. Bonus points if we can dynamically calculate transformations based on client configuration (position, display size, bezel size, etc.)

Hardware acceleration

One thing that’s bothering me is that the video transformations are applied in software using GStreamer elements. This works fine(ish) for the hardware I’m developing on, but in real life, we would want to use OpenGL(ES) transformations, or platform specific elements to have hardware-accelerated transformations. My initial thoughts are for this to be either API on playbin or a GstBin that takes a set of transformations as parameters and internally sets up the best method to do this based on whatever sink is available downstream (some sinks provide cropping and other transformations).

Why not audio?

I’ve only written about video transformations here, but we can do the same with audio transformations too. For example, multi-room audio systems allow you to configure the locations of wireless speakers — so you can set which one’s on the left, and which on the right — and the speaker will automatically play the appropriate channel. Implementing this should be quite easy with the infrastructure that’s currently in place.

Merry Happy .

I hope you enjoyed reading that — I’ve had great responses from a lot of people about how they might be able to use this work. If there’s something you’d like to see, leave a comment or file an issue.

Happy end of the year, and all the best for 2017!

GStreamer and Synchronisation Made Easy

A lesser known, but particularly powerful feature of GStreamer is our ability to play media synchronised across devices with fairly good accuracy.

The way things stand right now, though, achieving this requires some amount of fiddling and a reasonably thorough knowledge of how GStreamer’s synchronisation mechanisms work. While we have had some excellent talks about these at previous GStreamer conferences, getting things to work is still a fair amount of effort for someone not well-versed with GStreamer.

As part of my work with the Samsung OSG, I’ve been working on addressing this problem, by wrapping all the complexity in a library. The intention is that anybody who wants to implement the ability for different devices on a network to play the same stream and have them all synchronised should be able to do so with a few lines of code, and the basic know-how for writing GStreamer-based applications.

I’ve started work on this already, and you can find the code in the creatively named gst-sync-server repo.

Design and API

Let’s make this easier by starting with a picture …

Big picture of the architecture

Let’s say you’re writing a simple application where you have two ore more devices that need to play the same video stream, in sync. Your system would consist of two entities:

  • A server: this is where you configure what needs to be played. It instantiates a GstSyncServer object on which it can set a URI that needs to be played. There are other controls available here that I’ll get to in a moment.

  • A client: each device would be running a copy of the client, and would get information from the server telling it what to play, and what clock to use to make sure playback is synchronised. In practical terms, you do this by creating a GstSyncClient object, and giving it a playbin element which you’ve configured appropriately (this usually involves at least setting the appropriate video sink that integrates with your UI).

That’s pretty much it. Your application instantiates these two objects, starts them up, and as long as the clients can access the media URI, you magically have two synchronised streams on your devices.

Control

The keen observers among you would have noticed that there is a control entity in the above diagram that deals with communicating information from the server to clients over the network. While I have currently implemented a simple TCP protocol for this, my goal is to abstract out the control transport interface so that it is easy to drop in a custom transport (Websockets, a REST API, whatever).

The actual sync information is merely a structure marshalled into a JSON string and sent to clients every time something happens. Once your application has some media playing, the next thing you’ll want to do from your server is control playback. This can include

  • Changing what media is playing (like after the current media ends)
  • Pausing/resuming the media
  • Seeking
  • “Trick modes” such as fast forward or reverse playback

The first two of these work already, and seeking is on my short-term to-do list. Trick modes, as the name suggets, can be a bit more tricky, so I’ll likely get to them after other things are done.

Getting fancy

My hope is to see this library being used in a few other interesting use cases:

  • Video walls: having a number of displays stacked together so you have one giant display — these are all effectively playing different rectangles from the same video

  • Multiroom audio: you can play the same music across different speakers in a single room, or multiple rooms, or even group sets of speakers and play different media on different groups

  • Media sharing: being able to play music or videos on your phone and have your friends be able to listen/watch at the same time (a silent disco app?)

What next

At this point, the outline of what I think the API should look like is done. I still need to create the transport abstraction, but that’s pretty much a matter of extracting out the properties and signals that are part of the existing TCP transport.

What I would like is to hear from you, my dear readers who are interested in using this library — does the API look like it would work for you? Does the transport mechanism I describe above cover what you might need? There is example code that should make it easier to understand how this library is meant to be used.

Depending on the feedback I get, my next steps will be to implement the transport interface, refine the API a bit, fix a bunch of FIXMEs, and then see if this is something we can include in gst-plugins-bad.

Feel free to comment either on the Github repository, on this blog, or via email.

And don’t forget to watch this space for some videos and measurements of how GStreamer synchronised fares in real life!

GStreamer on Android and universal builds

This is a quick PSA for those of you using the GStreamer binary builds for Android.

With the Android NDK r12, the default behaviour while building native code changed from building for armeabi to building for all ABIs. So if your app doesn’t specify APP_ABI in its Application.mk, you will now get an error about unsupported architectures. This was tracked as bug 770631.

The idea behind this change is that your Android app should ship versions of your native code for all supported architectures as a “universal” build, so it is accessible to as many devices as possible.

To deal with this, we now provide a universal tarball which contains binaries for all archiectures that we support. This is currently ARM, ARMv7-A, ARMv8-A (64-bit), x86, and x86-64. That leaves MIPS and MIPS64 that are not currently supported.

If you’ve been using the GStreamer Android binaries before GStreamer 1.9.2, then you should start using the universal tarball rather than the architecture-specific tarball. You will need minor updates to your native build, like we made to the player example. You probably want to put the gstAndroidRoot variable in ~/.gradle/gradle.properties instead, though.

As Sebastian announced, assuming all goes well with the universal tarballs, we will stop shipping the per-arch tarballs — they are redundant, and just take up CI and disk resources.

There are some things that I’d like for us to be able to do better. The first is that Android Studio doesn’t pick up native code with our current build approach. This is a limitation of the Android Gradle NDK plugin, which doesn’t support a custom build. This should change with Android Studio 2.2.

I would also like to integrate better with Android Studio — either be able to specify the GStreamer Android binary path in the UI (like you do for the SDK/NDK), or better yet, have it be possible to specify the dependency in Gradle, and have it be automatically pulled from the Internet. If any of you are familiar with how we can do this, please shout out!