In my previous post, I alluded to an exciting development for PipeWire. I’m now thrilled to officially announce that Asymptotic will be undertaking several important tasks for the project, thanks to funding from the Sovereign Tech Fund (now part of the Sovereign Tech Agency).
Some of you might be familiar with the Sovereign Tech Fund from their funding for GNOME, GStreamer and systemd – they have been investing in foundational open source technology, supporting the digital commons in key areas, a mission closely aligned with our own.
We will be tackling three key areas of work.
ASHA hearing aid support
I wrote a bit about our efforts on this front. We have already completed the PipeWire support for single ASHA hearing aids, and are actively working on support for stereo pairs.
Improvements to GStreamer elements
We have been working through the GStreamer+PipeWire todo list, fixing bugs and making it easier to build audio and video streaming pipelines on top of PipeWire. A number of usability improvements have already landed, and more work on this front continues
A Rust-based client library
While we have a pretty functional set of Rust bindings around the C-based libpipewire already, we will be creating a pure Rust implementation of a PipeWire client, and provide that via a C API as well.
There are a number of advantages to this: type and memory safety being foremost, but we can also leverage Rust macros to eliminate a lot of boilerplate (there are community efforts in this direction already that we may be able to build upon).
This is a large undertaking, and this funding will allow us to tackle a big chunk of it – we are excited, and deeply appreciative of the work the Sovereign Tech Agency is doing in supporting critical open source infrastructure.
It’s 2025(!), and I thought I’d kick off the year with a post about some work
that we’ve been doing behind the scenes for a while. Grab a cup of
$beverage_of_choice, and let’s jump in with some context.
History: Hearing aids and Bluetooth
Various estimates put the number of people with some form of hearing loss at 5%
of the population. Hearing aids and cochlear implants are commonly used to help
deal with this (I’ll use “hearing aid” or “HA” in this post, but the same ideas
apply to both). Historically, these have been standalone devices, with some
primitive ways to receive audio remotely (hearing loops and telecoils).
As you might expect, the last couple of decades have seen advances that allow
consumer devices (such as phones, tablets, laptops, and TVs) to directly
connect to hearing aids over Bluetooth. This can provide significant quality of
life improvements – playing audio from a device’s speakers means the sound is
first distorted by the speakers, and then by the air between the speaker and
the hearing aid. Avoiding those two steps can make a big difference in the
quality of sound that reaches the user.
Comparison of audio paths
Unfortunately, the previous Bluetooth audio standards (BR/EDR and A2DP – used
by most Bluetooth audio devices you’ve come across) were not well-suited for
these use-cases, especially from a power-consumption perspective. This meant
that HA users would either have to rely on devices using proprietary protocols
(usually limited to Apple devices), or have a cumbersome additional dongle with
its own battery and charging needs.
Recent Past: Bluetooth LE
The more recent Bluetooth LE specification addresses some of the issues with
the previous spec (now known as Bluetooth Classic). It provides a low-power
base for devices to communicate with each other, and has been widely adopted in
consumer devices.
On top of this, we have the LE Audio standard, which provides audio streaming
services over Bluetooth LE for consumer audio devices and HAs. The hearing aid
industry has been an active participant in its development, and we should see
widespread support over time, I expect.
The base Bluetooth LE specification has been around from 2010, but the LE Audio
specification has only been public since 2021/2022. We’re still seeing devices
with LE Audio support trickle into the market.
In 2018, Google partnered with a hearing aid manufacturer to announce the ASHA
(Audio Streaming for Hearing Aids) protocol, presumably as a stop-gap. The
protocol uses Bluetooth LE (but not LE Audio) to support low-power audio
streaming to hearing aids, and is
publicly available.
Several devices have shipped with ASHA support in the last ~6 years.
A brief history of Bluetooth LE and audio
Hot Take: Obsolescence is bad UX
As end-users, we understand the push/pull of technological advancement and
obsolescence. As responsible citizens of the world, we also understand the
environmental impact of this.
The problem is much worse when we are talking about medical devices. Hearing
aids are expensive, and are expected to last a long time. It’s not uncommon for
people to use the same device for 5-10 years, or even longer.
In addition to the financial cost, there is also a significant emotional cost
to changing devices. There is usually a period of adjustment during which one
might be working with an audiologist to tune the device to one’s hearing.
Neuroplasticity allows the brain to adapt to the device and extract more
meaning over time. Changing devices effectively resets the process.
All this is to say that supporting older devices is a worthy goal in itself,
but has an additional set of dimensions in the context of accessibility.
HAs and Linux-based devices
Because of all this history, hearing aid manufacturers have traditionally
focused on mobile devices (i.e. Android and iOS). This is changing, with Apple
supporting its proprietary MFi (made for iPhone/iPad/iPod) protocol on macOS,
and Windows adding support for LE Audio on Windows 11.
This does leave the question of Linux-based devices, which is our primary
concern – can users of free software platforms also have an accessible user
experience?
A lot of work has gone into adding Bluetooth LE support in the Linux kernel and
BlueZ, and more still to add LE Audio support. PipeWire’s Bluetooth module now
includes support for LE Audio, and there is continuing effort to flesh this
out. Linux users with LE Audio-based hearing aids will be able to take
advantage of all this.
However, the ASHA specification was only ever supported on Android devices.
This is a bit of a shame, as there are likely a significant number of hearing
aids out there with ASHA support, which will hopefully still be around for the
next 5+ years. This felt like a gap that we could help fill.
Step 1: A Proof-of-Concept
We started out by looking at the ASHA specification, and the state of Bluetooth
LE in the Linux kernel. We spotted some things that the Android stack exposes
that BlueZ does not, but it seemed like all the pieces should be there.
Friend-of-Asymptotic,
Ravi Chandra Padmala spent
some time with us to implement a proof-of-concept. This was a pretty intense
journey in itself, as we had to identify some good reference hardware (we found
an ASHA implementation on the onsemi
RSL10),
and clean out the pipes between the kernel and userspace (LE
connection-oriented channels, which ASHA relies on, weren’t commonly used at
that time).
We did eventually get the
proof-of-concept done, and this
gave us confidence to move to the next step of integrating this into BlueZ –
albeit after a hiatus of paid work. We have to keep the lights on, after
all!
Step 2: ASHA in BlueZ
The BlueZ audio plugin implements various audio profiles within the BlueZ
daemon – this includes A2DP for Bluetooth Classic, as well as BAP for LE
Audio.
We decided to add ASHA support within this plugin. This would allow BlueZ to
perform privileged operations and then hand off a file descriptor for the
connection-oriented channel, so that any userspace application (such as
PipeWire) could actually stream audio to the hearing aid.
I implemented an initial version of the ASHA profile in the BlueZ audio plugin
last year, and thanks to Luiz Augusto von Dentz’
guidance and reviews, the plugin has
landed upstream.
This has been tested with a single hearing aid, and stereo support is pending.
In the process, we also found a small community of folks with deep interest in
this subject, and you can join us on #asha on the
BlueZ Slack.
Step 3: PipeWire support
To get end-to-end audio streaming working with any application, we need to
expose the BlueZ ASHA profile as a playback device on the audio server (i.e.,
PipeWire). This would make the HAs appear as just another audio output, and we
could route any or all system audio to it.
My colleague, Sanchayan Maity, has been working
on this for the last few weeks. The code is all more or less in place now, and
you can track our progress on the PipeWire
MR.
Step 4 and beyond: Testing, stereo support, …
Once we have the basic PipeWire support in place, we will implement stereo
support (the spec does not support more than 2 channels), and then we’ll have a
bunch of testing and feedback to work with. The goal is to make this a solid
and reliable solution for folks on Linux-based devices with hearing aids.
Once that is done, there are a number of UI-related tasks that would be nice to
have in order to provide a good user experience. This includes things like
combining the left and right HAs to present them as a single device, and access
to any tuning parameters.
Getting it done
This project has been on my mind since the ASHA specification was announced,
and it has been a long road to get here. We are in the enviable position of
being paid to work on challenging problems, and we often contribute our work
upstream. However, there are many such projects that would be valuable to
society, but don’t necessarily have a clear source of funding.
In this case, we found ourselves in an interesting position – we have the
expertise and context around the Linux audio stack to get this done. Our
business model allows us the luxury of taking bites out of problems like this,
and we’re happy to be able to do so.
However, it helps immensely when we do have funding to take on this work
end-to-end – we can focus on the task entirely and get it done faster.
Onward…
I am delighted to announce that we were able to find the financial support to
complete the PipeWire work! Once we land basic mono audio support in the MR
above, we’ll move on to implementing stereo support in the BlueZ plugin and the
PipeWire module. We’ll also be testing with some real-world devices, and we’ll
be leaning on our community for more feedback.
This is an exciting development, and I’ll be writing more about it in a
follow-up post in a few days. Stay tuned!
All of us at Asymptotic are back home from the
exciting week at GStreamer Conference 2024 in Montréal, Canada last month. It was great to hang out with the community and see all the great work going on in the GStreamer ecosystem.
Montréal sunsets are 😍
There were some visa-related adventures leading up to the conference, but
thanks to the organising team (shoutout to Mark Filion and Tim-Philipp Müller), everything was sorted out in time and Sanchayan and Taruntej were able to make it.
This conference was also special because this year marks the 25th anniversary of the GStreamer project!
Sanchayan spoke about his work with the various QUIC elements in GStreamer. We already have the quinnquicsrc and quinquicsink upstream, with a couple of plugins to allow (de)multiplexing of raw streams as well as an implementation or RTP-over-QUIC (RoQ). We’ve also started work on Media-over-QUIC (MoQ) elements.
This has been a fun challenge for us, as we’re looking to build out a general-purpose toolkit for building QUIC application-layer protocols in GStreamer. Watch this space for more updates as we build out more functionality, especially around MoQ.
Clock Rate Matching in GStreamer & PipeWire (video)
My talk was about an interesting corner of GStreamer, namely clock rate
matching. This is a part of live pipelines that is often taken for granted, so I wanted to give folks a peek under the hood.
The idea of doing this talk was was born out of some recent work we did to allow splitting up the graph clock in PipeWire from the PTP clock when sending AES67 streams on the network. I found the contrast between the PipeWire and GStreamer approaches thought-provoking, and wanted to share that with the community.
Next, Taruntej dove into how we optimised our usage of GStreamer in a real-time audio application on Windows. We had some pretty tight performance requirements for this project, and Taruntej spent a lot of time profiling and tuning the pipeline to meet them. He shared some of the lessons learned and the tools he used to get there.
Simplifying HLS playlist generation in GStreamer (video)
Sanchayan also walked us through the work he’s been doing to simplify HLS (HTTP Live Streaming) multivariant playlist generation. This should be a nice feature to round out GStreamer’s already strong support for generating HLS streams. We are also exploring the possibility of reusing the same code for generating DASH (Dynamic Adaptive Streaming over HTTP) manifests.
Hackfest
As usual, the conference was followed by a two-day hackfest. We worked on a few interesting problems:
Sanchayan addressed some feedback on the QUIC muxer elements, and then investigated extending the HLS elements for SCTE-35 marker insertion and DASH support
Taruntej worked on improvements to the threadshare elements, specifically to bring some ts-udpsrc element features in line with udpsrc
I spent some time reviewing a long-pending merge request to add soft-seeking support to the AWS S3 sink (so that it might be possible to upload seekable MP4s, for example, directly to S3). I also had a very productive conversation with George Kiagiadakis about how we should improve the PipeWire GStreamer elements (more on this soon!)
All in all, it was a great time, and I’m looking forward to the spring hackfest and conference in the the latter part next year!
The WebRTC nerds among us will remember the first thing we learn about WebRTC, which is that it is a specification for peer-to-peer communication of media and data, but it does not specify how signalling is done.
Or put more simply, if you want call someone on the web, WebRTC tells you how you can transfer audio, video and data, but it leaves out the bit about how you make the call itself: how do you locate the person you’re calling, let them know you’d like to call them, and a few following steps before you can see and talk to each other.
WebRTC signalling
While this allows services to provide their own mechanisms to manage how WebRTC calls work, the lack of a standard mechanism means that general-purpose applications need to individually integrate each service that they want to support. For example, GStreamer’s webrtcsrc and webrtcsink elements support various signalling protocols, including Janus Video Rooms, LiveKit, and Amazon Kinesis Video Streams.
However, having a standard way for clients to do signalling would help developers focus on their application and worry less about interoperability with different services.
(author’s note: the puns really do write themselves :))
As the names suggest, the specifications provide a way to perform signalling using HTTP. WHIP gives us a way to send media to a server, to ingest into a WebRTC call or live stream, for example.
Conversely, WHEP gives us a way for a client to use HTTP signalling to consume a WebRTC stream – for example to create a simple web-based consumer of a WebRTC call, or tap into a live streaming pipeline.
WHIP and WHEP
With this view of the world, WHIP and WHEP can be used both for calling applications, but also as an alternative way to ingest or play back live streams, with lower latency and a near-ubiquitous real-time communication API.
We know GStreamer already provides developers two ways to work with WebRTC streams:
webrtcbin: provides a low-level API, akin to the PeerConnection API that browser-based users of WebRTC will be familiar with
webrtcsrc and webrtcsink: provide high-level elements that can respectively produce/consume media from/to a WebRTC endpoint
At Asymptotic, my colleagues Tarun and Sanchayan have been using these building blocks to implement GStreamer elements for both the WHIP and WHEP specifications. You can find these in the GStreamer Rust plugins repository.
Our initial implementations were based on webrtcbin, but have since been moved over to the higher-level APIs to reuse common functionality (such as automatic encoding/decoding and congestion control). Tarun covered our work in a talk at last year’s GStreamer Conference.
Today, we have 4 elements implementing WHIP and WHEP.
Clients
whipclientsink: This is a webrtcsink-based implementation of a WHIP client, using which you can send media to a WHIP server. For example, streaming your camera to a WHIP server is as simple as:
whepclientsrc: This is work in progress and allows us to build player applications to connect to a WHEP server and consume media from it. The goal is to make playing a WHEP stream as simple as:
The client elements fit quite neatly into how we might imagine GStreamer-based clients could work. You could stream arbitrary stored or live media to a WHIP server, and play back any media a WHEP server provides. Both pipelines implicitly benefit from GStreamer’s ability to use hardware-acceleration capabilities of the platform they are running on.
GStreamer WHIP/WHEP clients
Servers
whipserversrc: Allows us to create a WHIP server to which clients can connect and provide media, each of which will be exposed as GStreamer pads that can be arbitrarily routed and combined as required. We have an example server that can
play all the streams being sent to it.
whepserversink: Finally we have ongoing work to publish arbitrary streams over WHEP for web-based clients to consume this media.
The two server elements open up a number of interesting possibilities. We can ingest arbitrary media with WHIP, and then decode and process, or forward it, depending on what the application requires. We expect that the server API will grow over time, based on the different kinds of use-cases we wish to support.
GStreamer WHIP/WHEP server
This is all pretty exciting, as we have all the pieces to create flexible pipelines for routing media between WebRTC-based endpoints without having to worry about service-specific signalling.
If you’re looking for help realising WHIP/WHEP based endpoints, or other media streaming pipelines, don’t hesitate to reach out to us!
It’s been a busy few several months, but now that we have some breathing
room, I wanted to take stock of what we have done over the last year or so.
This is a good thing for most people and companies to do of course, but being a
scrappy, (questionably) young organisation, it’s doubly important for us to
introspect. This allows us to both recognise our achievements and ensure that
we are accomplishing what we have set out to do.
One thing that is clear to me is that we have been lagging in writing about
some of the interesting things that we have had the opportunity to work on,
so you can expect to see some more posts expanding on what you find below, as
well as some of the newer work that we have begun.
(note: I write about our open source contributions below, but needless to say,
none of it is possible without the collaboration, input, and reviews of members of
the community)
WHIP/WHEP client and server for GStreamer
If you’re in the WebRTC world, you likely have not missed the excitement around
standardisation of HTTP-based signalling protocols, culminating in the
WHIP and
WHEP specifications.
Tarun has been driving our client and server
implementations for both these protocols, and in the process has been
refactoring some of the webrtcsink and webrtcsrc code to make it easier to
add more signaller implementations. You can find out more about this work in
his talk at GstConf 2023
and we’ll be writing more about the ongoing effort here as well.
Low-latency embedded audio with PipeWire
Some of our work involves implementing a framework for very low-latency audio
processing on an embedded device. PipeWire is a good fit for this sort of
application, but we have had to implement a couple of features to make it work.
It turns out that doing timer-based scheduling can be more CPU intensive than
ALSA period interrupts at low latencies, so we implemented an IRQ-based
scheduling mode for PipeWire. This is now used by default when a pro-audio
profile is selected for an ALSA device.
In addition to this, we also implemented rate adaptation for USB gadget devices
using the USB Audio Class “feedback control” mechanism. This allows USB gadget
devices to adapt their playback/capture rates to the graph’s rate without
having to perform resampling on the device, saving valuable CPU and latency.
There is likely still some room to optimise things, so expect to more hear on
this front soon.
This should be useful in various embedded devices that have both the hardware
and firmware to make use of this power-saving feature.
GStreamer LC3 encoder and decoder
Tarun wrote a GStreamer plugin implementing the LC3 codec
using the liblc3 library. This is the
primary codec for next-generation wireless audio devices implementing the
Bluetooth LE Audio specification. The plugin is upstream and can be used to
encode and decode LC3 data already, but will likely be more useful when the
existing Bluetooth plugins to talk to Bluetooth devices get LE audio support.
QUIC plugins for GStreamer
Sanchayan implemented a QUIC source and sink plugin in
Rust, allowing us to start experimenting with the next generation of network
transports. For the curious, the plugins sit on top of the Quinn
implementation of the QUIC protocol.
There is a merge request open
that should land soon, and we’re already seeing folks using these plugins.
AWS S3 plugins
We’ve been fleshing out the AWS S3 plugins over the years, and we’ve added a
new awss3putobjectsink. This provides a better way to push small or sparse
data to S3 (subtitles, for example), without potentially losing data in
case of a pipeline crash.
We’ll also be expecting this to look a little more like multifilesink,
allowing us to arbitrary split up data and write to S3 directly as multiple
objects.
Update to webrtc-audio-processing
We also updated the webrtc-audio-processing
library, based on more recent upstream libwebrtc. This is one of those things
that becomes surprisingly hard as you get into it — packaging an API-unstable
library correctly, while supporting a plethora of operating system and
architecture combinations.
Clients
We can’t always speak publicly of the work we are doing with our clients, but
there have been a few interesting developments we can (and have spoken about).
Both Sanchayan and I spoke a bit about our work with WebRTC-as-a-service
provider, Daily. My talk at the GStreamer Conference
was a summary of the work I wrote about previously
about what we learned while building Daily’s live streaming, recording, and
other backend services. There were other clients we worked with during the
year with similar experiences.
Sanchayan spoke about the interesting approach to building
SIP support
that we took for Daily. This was a pretty fun project, allowing us to build a
modern server-side SIP client with GStreamer and SIP.js.
An ongoing project we are working on is building AES67 support using GStreamer
for FreeSWITCH, which essentially allows
bridging low-latency network audio equipment with existing SIP and related
infrastructure.
As you might have noticed from previous sections, we are also working on a
low-latency audio appliance using PipeWire.
Retrospective
All in all, we’ve had a reasonably productive 2023. There are things I know we
can do better in our upstream efforts to help move merge requests and issues,
and I hope to address this in 2024.
We have ideas for larger projects that we would like to take on. Some of these
we might be able to find clients who would be willing to pay for. For the ideas
that we think are useful but may not find any funding, we will continue to
spend our spare time to push forward.
If you made this this far, thank you, and look out for more updates!
For the last year and a half, we at Asymptotic have been working with the excellent team at Daily. I’d like to share a little bit about what we’ve learned.
Daily is a real time calling platform as a service. One standard feature that users have come to expect in their calls is the ability to record them, or to stream their conversations to a larger audience. This involves mixing together all the audio/video from each participant and then storing it, or streaming it live via YouTube, Twitch, or any other third-party service.
As you might expect, GStreamer is a good fit for building this kind of functionality, where we consume a bunch of RTP streams, composite/mix them, and then send them out to one or more external services (Amazon’s S3 for recordings and HLS, or a third-party RTMP server).
I’ve written about how we implemented this feature elsewhere, but I’ll summarise briefly.
This is a slightly longer post than usual, so grab a cup of your favourite beverage, or jump straight to the summary section for the tl;dr.
As the third and final day of the PipeWire hackfest draws to a close, I thought I’d summarise some of my thoughts on the goings-on and the future.
Thanks
Before I get into the details, I want to send out a big thank you to:
Christian Schaller for all the hard work of organising the event and Wim Taymans for the work on PipeWire so far (and in the future)
The GNOME Foundation, for sponsoring the event as a whole
Qualcomm, who are funding my presence at the event
Collabora, for sponsoring dinner on Monday
Everybody who attended and participate for their time and thoughtful comments
Background
For those of you who are not familiar with it, PipeWire (previously Pinos, previously PulseVideo) was Wim’s effort at providing secure, multi-program access to video devices (like webcams, or the desktop for screen capture). As he went down that rabbit hole, he wrote SPA, a lightweight general-purpose framework for representing a streaming graph, and this led to the idea of expanding the project to include support for low latency audio.
The Linux userspace audio story has, for the longest time, consisted of two top-level components: PulseAudio which handles consumer audio (power efficiency, wide range of arbitrary hardware), and JACK which deals with pro audio (low latency, high performance). Consolidating this into a good out-of-the-box experience for all use-cases has been a long-standing goal for myself and others in the community that I have spoken to.
An Opportunity
From a PulseAudio perspective, it has been hard to achieve the 1-to-few millisecond latency numbers that would be absolutely necessary for professional audio use-cases. A lot of work has gone into improving this situation, most recently with David Henningsson’s shared-ringbuffer channels that made client/server communication more efficient.
At the same time, as application sandboxing frameworks such as Flatpak have added security requirements of us that were not accounted for when PulseAudio was written. Examples including choosing which devices an application has access to (or can even know of) or which applications can act as control entities (set routing etc., enable/disable devices). Some work has gone into this — Ahmed Darwish did some key work to get memfd support in PulseAudio, and Wim has prototyped an access-control mechanism module to enable a Flatpak portal for sound.
All this said, there are still fundamental limitations in architectural decisions in PulseAudio that would require significant plumbing to address. With Wim’s work on PipeWire and his extensive background with GStreamer and PulseAudio itself, I think we have an opportunity to revisit some of those decisions with the benefit of a decade’s worth of learning deploying PulseAudio in various domains starting from desktops/laptops to phones, cars, robots, home audio, telephony systems and a lot more.
Key Ideas
There are some core ideas of PipeWire that I am quite excited about.
The first of these is the graph. Like JACK, the entities that participate in the data flow are represented by PipeWire as nodes in a graph, and routing between nodes is very flexible — you can route applications to playback devices and capture devices to applications, but you can also route applications to other applications, and this is notionally the same thing.
The second idea is a bit more radical — PipeWire itself only “runs” the graph. The actual connections between nodes are created and managed by a “session manager”. This allows us to completely separate the data flow from policy, which means we could write completely separate policy for desktop use cases vs. specific embedded use cases. I’m particularly excited to see this be scriptable in a higher-level language, which is something Bastien has already started work on!
A powerful idea in PulseAudio was rewinding — the ability to send out huge buffers to the device, but the flexibility to rewind that data when things changed (a new stream got added, or the stream moved, or the volume changed). While this is great for power saving, it is a significant amount of complexity in the code. In addition, with some filters in the data path, rewinding can break the algorithm by introducing non-linearity. PipeWire doesn’t support rewinds, and we will need to find a good way to manage latencies to account for low power use cases. One example is that we could have the session manager bump up the device latency when we know latency doesn’t matter (Android does this when the screen is off).
There are a bunch of other things that are in the process of being fleshed out, like being able to represent the hardware as a graph as well, to have a clearer idea of what is going on within a node. More updates as these things are more concrete.
The Way Forward
There is a good summary by Christian about our discussion about what is missing and how we can go about trying to make a smooth transition for PulseAudio users. There is, of course, a lot to do, and my ideal outcome is that we one day flip a switch and nobody knows that we have done so.
In practice, we’ll need to figure out how to make this transition seamless for most people, while folks with custom setup will need to be given a long runway and clear documentation to know what to do. It’s way to early to talk about this in more specifics, however.
Configuration
One key thing that PulseAudio does right (I know there are people who disagree!) is having a custom configuration that automagically works on a lot of Intel HDA-based systems. We’ve been wondering how to deal with this in PipeWire, and the path we think makes sense is to transition to ALSA UCM configuration. This is not as flexible as we need it to be, but I’d like to extend it for that purpose if possible. This would ideally also help consolidate the various methods of configuration being used by the various Linux userspaces.
To that end, I’ve started trying to get a UCM setup on my desktop that PulseAudio can use, and be functionally equivalent to what we do with our existing configuration. There are missing bits and bobs, and I’m currently focusing on the ones related to hardware volume control. I’ll write about this in the future as the effort expands out to other hardware.
Onwards and upwards
The transition to PipeWire is unlikely to be quick or completely-painless or free of contention. For those who are worried about the future, know that any switch is still a long way away. In the mean time, however, constructive feedback and comments are welcome.
It’s been a little over a month since I got back from Manchester, and this post should’ve come out earlier but I’ve been swamped.
The conference was absolutely lovely, the organisation was a 110% on point (serious kudos, I know first hand how hard that is). Others on Planet GNOME have written extensively about the talks, the social events, and everything in between that made it a great experience. What I would like to write about is about why this year’s GUADEC was special to me.
GNOME turning 20 years old is obviously a large milestone, and one of the main reasons I wanted to make sure I was at Manchester this year. There were many occasions to take stock of how far we had come, where we are, and most importantly, to reaffirm who we are, and why we do what we do.
And all of this made me think of my own history with GNOME. In 2002/2003, Nat and Miguel came down to Bangalore to talk about some of the work they were doing. I know I wasn’t the only one who found their energy infectious, and at Linux Bangalore 2003, they got on stage, just sat down, and started hacking up a GtkMozEmbed-based browser. The idea itself was fun, but what I took away — and I know I wasn’t the only one — is the sheer inclusive joy they shared in creating something and sharing that with their audience.
For all of us working on GNOME in whatever way we choose to contribute, there is the immediate gratification of shaping this project, as well as the larger ideological underpinning of making everyone’s experience talking to their computers better and free-er.
But I think it is also important to remember that all our efforts to make our community an inviting and inclusive space have a deep impact across the world. So much so that complete strangers from around the world are able to feel a sense of belonging to something much larger than themselves.
I am excited about everything we will achieve in the next 20 years.
(thanks go out to the GNOME Foundation for helping me attend GUADEC this year)
I’ve written a bit in my last two blog posts about the work I’ve been doing in inter-device synchronised playback using GStreamer. I introduced the library and then demonstrated its use in building video walls.
The important thing in synchronisation, of course, is how much in-sync are the streams? The video in my previous post gave a glimpse into that, and in this post I’ll expand on that with a more rigorous, quantifiable approach.
Before I start, a quick note: I am currently providing freelance consulting around GStreamer, PulseAudio and open source multimedia in general. If you’re looking for help with any of these, do get in touch.
The sync measurement setup
Quantifying what?
What is it that we are trying to measure? Let’s look at this in terms of the outcome — I have two computers, on a network. Using the gst-sync-server library, I play a stream on both of them. The ideal outcome is that the same video frame is displayed at exactly the same time, and the audio sample being played out of the respective speakers is also identical at any given instant.
As we saw previously, the video output is not a good way to measure what we want. This is because video displays are updated in sync with the display clock, over which consumer hardware generally does not have control. Besides, our eyes are not that sensitive to minor differences in timing unless images are side-by-side. After all, we’re fooling it with static pictures that change every 16.67ms or so.
Using audio, though, we should be able to do better. Digital audio streams for music/videos typically consist of 44100 or 48000 samples a second, so we have a much finer granularity than video provides us. The human ear is also fairly sensitive to timings with regards to sound. If it hears the same sound at an interval larger than 10 ms, you will hear two distinct sounds and the echo will annoy you to no end.
Measuring audio is also good enough because once you’ve got audio in sync, GStreamer will take care of A/V sync itself.
Setup
Okay, so now that we know what we want to measure, but how do we measure it? The setup is illustrated below:
Sync measurement setup illustrated
As before, I’ve set up my desktop PC and laptop to play the same stream in sync. The stream being played is a local audio file — I’m keeping the setup simple by not adding network streaming to the equation.
The audio itself is just a tick sound every second. The tick is a simple 440 Hz sine wave (A₄ for the musically inclined) that runs for for 1600 samples. It sounds something like this:
I’ve connected the 3.5mm audio output of both the computers to my faithful digital oscilloscope (a Tektronix TBS 1072B if you wanted to know). So now measuring synchronisation is really a question of seeing how far apart the leading edge of the sine wave on the tick is.
Of course, this assumes we’re not more than 1s out of sync (that’s the periodicity of the tick itself), and I’ve verified that by playing non-periodic sounds (any song or video) and making sure they’re in sync as well. You can trust me on this, or better yet, get the code and try it yourself! :)
The last piece to worry about — the network. How well we can sync the two streams depends on how well we can synchronise the clocks of the pipeline we’re running on each of the two devices. I’ll talk about how this works in a subsequent post, but my measurements are done on both a wired and wireless network.
Measurements
Before we get into it, we should keep in mind that due to how we synchronise streams — using a network clock — how in-sync our streams are will vary over time depending on the quality of the network connection.
If this variation is small enough, it won’t be noticeable. If it is large (10s of milliseconds), then we may notice start to notice it as echo, or glitches when the pipeline tries to correct for the lack of sync.
In the first setup, my laptop and desktop are connected to each other directly via a LAN cable. The result looks something like this:
Sync on LAN, working well
Sync on LAN, working well, up close
Sync on LAN, slightly off
Sync on LAN, slightly off, up close
The first two images show the best case — we need to zoom in real close to see how out of sync the audio is, and it’s roughly 50µs.
The next two images show the “worst case”. This time, the zoomed out (5ms) version shows some out-of-sync-ness, and on zooming in, we see that it’s in the order of 500µs.
So even our bad case is actually quite good — sound travels at about 340 m/s, so 500µs is the equivalent of two speakers about 17cm apart.
Now let’s make things a little more interesting. With both my laptop and desktop connected to a wifi network:
Sync on wifi, okay on average
Sync on wifi, okay on average, up close
Sync on wifi, goes off on bad connection
Sync on wifi, goes off, up close
Sync on wifi, when it’s bad
Sync on wifi, when it’s good
On average, the sync can be quite okay. The first pair of images show sync to be within about 300µs.
However, the wifi on my desktop is flaky, so you can see it go off up to 2.5ms in the next pair. In my setup, it even goes off up to 10-20ms, before returning to the average case. The next two images show it go back and forth.
Why does this happen? Well, let’s take a quick look at what ping statistics from my desktop to my laptop look like:
Ping from desktop to laptop on wifi
That’s not good — you can see that the minimum, average and maximum RTT are very different. Our network clock logic probably needs some tuning to deal with this much jitter.
Conclusion
These measurements show that we can get some (in my opinion) pretty good synchronisation between devices using GStreamer. I wrote the gst-sync-server library to make it easy to build applications on top of this feature.
The obvious area to improve is how we cope with jittery networks. We’ve added some infrastructure to capture and replay clock synchronisation messages offline. What remains is to build a large enough body of good and bad cases, and then tune the sync algorithm to work as well as possible with all of these.
Also, Florent over at Ubicast pointed out a nice tool they’ve written to measure A/V sync on the same device. It would be interesting to modify this to allow for automated measurement of inter-device sync.
In a future post, I’ll write more about how we actually achieve synchronisation between devices, and how we can go about improving it.
Hello again, and I hope you’re having a pleasant end of the year (if you are, maybe don’t check the news until next year).
I’d written about synchronised playback with GStreamer a little while ago, and work on that has been continuing apace. Since I last wrote about it, a bunch of work has gone in:
Landed support for sending a playlist to clients (instead of a single URI)
Added the ability to start/stop playback
The API has been cleaned up considerably to allow us to consider including this upstream
The control protocol implementation was made an interface, so you don’t have to use the built-in TCP server (different use-cases might want different transports)
Made a bunch of robustness fixes and documentation
Introduced API for clients to send the server information about themselves
Also added API for the server to send video transformations for specific clients to apply before rendering
While the other bits are exciting in their own right, in this post I’m going to talk about the last two items.
Video walls
For those of you who aren’t familiar with the term, a video wall is just an array of displays stacked to make a larger display. These are often used in public installations.
One way to set up a video wall is to have each display connected to a small computer (such as the Raspberry Pi), and have them play a part of the entire video, cropped and scaled for the display that is connected. This might look something like:
A 4×4 video wall
The tricky part, of course, is synchronisation — which is where gst-sync-server comes in. Since we’re able to play a given stream in sync across devices on a network, the only missing piece was the ability to distribute a set of per-client transformations so that clients could apply those, and that is now done.
In order to keep things clean from an API perspective, I took the following approach:
Clients now have the ability to send a client ID and a configuration (which is just a dictionary) when they first connect to the server
The server API emits a signal with the client ID and configuration, which allows you to know when a client connects, what kind of display it’s running, and where it is positioned
The server now has additional fields to send a map of client ID to a set of video transformations
This allows us to do fancy things like having each client manage its own information with the server dynamically adapting the set of transformations based on what is connected. Of course, the simpler case of having a static configuration on the server also works.
Demo
Since seeing is believing, here’s a demo of the synchronised playback in action:
The setup is my laptop, which has an Intel GPU, and my desktop, which has an NVidia GPU. These are connected to two monitors (thanks go out to my good friends from Uncommon for lending me their thin-bezelled displays).
The video resolution is 1920×800, and I’ve adjusted the crop parameters to account for the bezels, so the video actually does look continuous. I’ve uploaded the text configuration if you’re curious about what that looks like.
As I mention in the video, the synchronisation is not as tight than I would like it to be. This is most likely because of the differing device configurations. I’ve been working with Nicolas to try to address this shortcoming by using some timing extensions that the Wayland protocol allows for. More news on this as it breaks.
More generally, I’ve done some work to quantify the degree of sync, but I’m going to leave that for another day.
p.s. the reason I used kmssink in the demo was that it was the quickest way I know of to get a full-screen video going — I’m happy to hear about alternatives, though
Future work
Make it real
My demo was implemented quite quickly by allowing the example server code to load and serve up a static configuration. What I would like is to have a proper working application that people can easily package and deploy on the kinds of embedded systems used in real video walls. If you’re interested in taking this up, I’d be happy to help out. Bonus points if we can dynamically calculate transformations based on client configuration (position, display size, bezel size, etc.)
Hardware acceleration
One thing that’s bothering me is that the video transformations are applied in software using GStreamer elements. This works fine(ish) for the hardware I’m developing on, but in real life, we would want to use OpenGL(ES) transformations, or platform specific elements to have hardware-accelerated transformations. My initial thoughts are for this to be either API on playbin or a GstBin that takes a set of transformations as parameters and internally sets up the best method to do this based on whatever sink is available downstream (some sinks provide cropping and other transformations).
Why not audio?
I’ve only written about video transformations here, but we can do the same with audio transformations too. For example, multi-room audio systems allow you to configure the locations of wireless speakers — so you can set which one’s on the left, and which on the right — and the speaker will automatically play the appropriate channel. Implementing this should be quite easy with the infrastructure that’s currently in place.
Merry Happy .
I hope you enjoyed reading that — I’ve had great responses from a lot of people about how they might be able to use this work. If there’s something you’d like to see, leave a comment or file an issue.