June 25, 2025

The Unbearable Anger of Broken Audio

It should be surprising to absolutely nobody that the Linux audio stack is often the subject of varying levels of negative feedback, ranging from drive-by meme snark to apoplectic rage^[1].

A lot of what computers are used for today involves audiovisual media in some form or the other, and having that not work can throw a wrench in just going about our day. So it is completely understandable for a person to get frustrated when audio on their device doesn’t work (or maybe worse, stops working for no perceivable reason).

It is also then completely understandable for this person to turn up on Matrix/IRC/Gitlab and make their displeasure known to us in the PipeWire (and previously PulseAudio) community. After all, we’re the maintainers of the part of the audio stack most visible to you.

To add to this, we have two and a half decades’ worth of history in building the modern Linux desktop audio stack, which means there are historical artifacts in the stack (OSS -> ALSA -> ESD/aRTs -> PulseAudio/JACK -> PipeWire). And a lot of historical animus that apparently still needs venting.

In large centralised organisations, there is a support function whose (thankless) job it is to absorb some of that impact before passing it on to the people who are responsible for fixing the problem. In the F/OSS community, sometimes we’re lucky to have folks who step up to help users and triage issues. Usually though, it’s just maintainers managing this.

This has a number of … interesting … impacts for those of us who work in the space. For me this includes:

Developing thick skin
Trying to maintain equanimity while being screamed at
Knowing to step away from the keyboard when that doesn’t work
Repeated reminders that things do work for millions of users every day

So while the causes for the animosity are often sympathetic, this is not a recipe for a healthy community. I try to be judicious while invoking the fd.o Code of Conduct, but thick skin or not, abusive behaviour only results in a toxic community, so there are limits to that.

While I paint a picture of doom and gloom, most recent user feedback and issue reporting in the PipeWire community has been refreshingly positive. Even the trigger for this post is an issue from an extremely belligerent user (who I do sympathise with), who was quickly supplanted by someone else who has been extremely courteous in the face of what is definitely a frustrating experience.

So if I had to ask something of you, dear reader – the next time you’re angry with the maintainers of some free software you depend on, please get some of the venting out of your system in private (tell your friends how terrible we are, or go for a walk maybe), so we can have a reasonable conversation and make things better.

Thank you for reading!

I’m not linking to examples, because that’s not the point of this post. ↩︎

February 5, 2025

PipeWire ♥ Sovereign Tech Agency

In my previous post, I alluded to an exciting development for PipeWire. I’m now thrilled to officially announce that Asymptotic will be undertaking several important tasks for the project, thanks to funding from the Sovereign Tech Fund (now part of the Sovereign Tech Agency).

Some of you might be familiar with the Sovereign Tech Fund from their funding for GNOME, GStreamer and systemd – they have been investing in foundational open source technology, supporting the digital commons in key areas, a mission closely aligned with our own.

We will be tackling three key areas of work.

ASHA hearing aid support

I wrote a bit about our efforts on this front. We have already completed the PipeWire support for single ASHA hearing aids, and are actively working on support for stereo pairs.

Improvements to GStreamer elements

We have been working through the GStreamer+PipeWire todo list, fixing bugs and making it easier to build audio and video streaming pipelines on top of PipeWire. A number of usability improvements have already landed, and more work on this front continues

A Rust-based client library

While we have a pretty functional set of Rust bindings around the C-based libpipewire already, we will be creating a pure Rust implementation of a PipeWire client, and provide that via a C API as well.

There are a number of advantages to this: type and memory safety being foremost, but we can also leverage Rust macros to eliminate a lot of boilerplate (there are community efforts in this direction already that we may be able to build upon).

This is a large undertaking, and this funding will allow us to tackle a big chunk of it – we are excited, and deeply appreciative of the work the Sovereign Tech Agency is doing in supporting critical open source infrastructure.

Watch this space for more updates!

January 7, 2025

A Brimful of ASHA

It’s 2025(!), and I thought I’d kick off the year with a post about some work that we’ve been doing behind the scenes for a while. Grab a cup of $beverage_of_choice, and let’s jump in with some context.

History: Hearing aids and Bluetooth

Various estimates put the number of people with some form of hearing loss at 5% of the population. Hearing aids and cochlear implants are commonly used to help deal with this (I’ll use “hearing aid” or “HA” in this post, but the same ideas apply to both). Historically, these have been standalone devices, with some primitive ways to receive audio remotely (hearing loops and telecoils).

As you might expect, the last couple of decades have seen advances that allow consumer devices (such as phones, tablets, laptops, and TVs) to directly connect to hearing aids over Bluetooth. This can provide significant quality of life improvements – playing audio from a device’s speakers means the sound is first distorted by the speakers, and then by the air between the speaker and the hearing aid. Avoiding those two steps can make a big difference in the quality of sound that reaches the user.

An illustration of the audio path through air vs. wireless audio (having higher fidelity) — Comparison of audio paths

Unfortunately, the previous Bluetooth audio standards (BR/EDR and A2DP – used by most Bluetooth audio devices you’ve come across) were not well-suited for these use-cases, especially from a power-consumption perspective. This meant that HA users would either have to rely on devices using proprietary protocols (usually limited to Apple devices), or have a cumbersome additional dongle with its own battery and charging needs.

Recent Past: Bluetooth LE

The more recent Bluetooth LE specification addresses some of the issues with the previous spec (now known as Bluetooth Classic). It provides a low-power base for devices to communicate with each other, and has been widely adopted in consumer devices.

On top of this, we have the LE Audio standard, which provides audio streaming services over Bluetooth LE for consumer audio devices and HAs. The hearing aid industry has been an active participant in its development, and we should see widespread support over time, I expect.

The base Bluetooth LE specification has been around from 2010, but the LE Audio specification has only been public since 2021/2022. We’re still seeing devices with LE Audio support trickle into the market.

In 2018, Google partnered with a hearing aid manufacturer to announce the ASHA (Audio Streaming for Hearing Aids) protocol, presumably as a stop-gap. The protocol uses Bluetooth LE (but not LE Audio) to support low-power audio streaming to hearing aids, and is publicly available. Several devices have shipped with ASHA support in the last ~6 years.

A brief history of Bluetooth LE and audio

Hot Take: Obsolescence is bad UX

As end-users, we understand the push/pull of technological advancement and obsolescence. As responsible citizens of the world, we also understand the environmental impact of this.

The problem is much worse when we are talking about medical devices. Hearing aids are expensive, and are expected to last a long time. It’s not uncommon for people to use the same device for 5-10 years, or even longer.

In addition to the financial cost, there is also a significant emotional cost to changing devices. There is usually a period of adjustment during which one might be working with an audiologist to tune the device to one’s hearing. Neuroplasticity allows the brain to adapt to the device and extract more meaning over time. Changing devices effectively resets the process.

All this is to say that supporting older devices is a worthy goal in itself, but has an additional set of dimensions in the context of accessibility.

HAs and Linux-based devices

Because of all this history, hearing aid manufacturers have traditionally focused on mobile devices (i.e. Android and iOS). This is changing, with Apple supporting its proprietary MFi (made for iPhone/iPad/iPod) protocol on macOS, and Windows adding support for LE Audio on Windows 11.

This does leave the question of Linux-based devices, which is our primary concern – can users of free software platforms also have an accessible user experience?

A lot of work has gone into adding Bluetooth LE support in the Linux kernel and BlueZ, and more still to add LE Audio support. PipeWire’s Bluetooth module now includes support for LE Audio, and there is continuing effort to flesh this out. Linux users with LE Audio-based hearing aids will be able to take advantage of all this.

However, the ASHA specification was only ever supported on Android devices. This is a bit of a shame, as there are likely a significant number of hearing aids out there with ASHA support, which will hopefully still be around for the next 5+ years. This felt like a gap that we could help fill.

Step 1: A Proof-of-Concept

We started out by looking at the ASHA specification, and the state of Bluetooth LE in the Linux kernel. We spotted some things that the Android stack exposes that BlueZ does not, but it seemed like all the pieces should be there.

Friend-of-Asymptotic, Ravi Chandra Padmala spent some time with us to implement a proof-of-concept. This was a pretty intense journey in itself, as we had to identify some good reference hardware (we found an ASHA implementation on the onsemi RSL10), and clean out the pipes between the kernel and userspace (LE connection-oriented channels, which ASHA relies on, weren’t commonly used at that time).

We did eventually get the proof-of-concept done, and this gave us confidence to move to the next step of integrating this into BlueZ – albeit after a hiatus of paid work. We have to keep the lights on, after all!

Step 2: ASHA in BlueZ

The BlueZ audio plugin implements various audio profiles within the BlueZ daemon – this includes A2DP for Bluetooth Classic, as well as BAP for LE Audio.

We decided to add ASHA support within this plugin. This would allow BlueZ to perform privileged operations and then hand off a file descriptor for the connection-oriented channel, so that any userspace application (such as PipeWire) could actually stream audio to the hearing aid.

I implemented an initial version of the ASHA profile in the BlueZ audio plugin last year, and thanks to Luiz Augusto von Dentz’ guidance and reviews, the plugin has landed upstream.

This has been tested with a single hearing aid, and stereo support is pending. In the process, we also found a small community of folks with deep interest in this subject, and you can join us on #asha on the BlueZ Slack.

Step 3: PipeWire support

To get end-to-end audio streaming working with any application, we need to expose the BlueZ ASHA profile as a playback device on the audio server (i.e., PipeWire). This would make the HAs appear as just another audio output, and we could route any or all system audio to it.

My colleague, Sanchayan Maity, has been working on this for the last few weeks. The code is all more or less in place now, and you can track our progress on the PipeWire MR.

Step 4 and beyond: Testing, stereo support, …

Once we have the basic PipeWire support in place, we will implement stereo support (the spec does not support more than 2 channels), and then we’ll have a bunch of testing and feedback to work with. The goal is to make this a solid and reliable solution for folks on Linux-based devices with hearing aids.

Once that is done, there are a number of UI-related tasks that would be nice to have in order to provide a good user experience. This includes things like combining the left and right HAs to present them as a single device, and access to any tuning parameters.

Getting it done

This project has been on my mind since the ASHA specification was announced, and it has been a long road to get here. We are in the enviable position of being paid to work on challenging problems, and we often contribute our work upstream. However, there are many such projects that would be valuable to society, but don’t necessarily have a clear source of funding.

In this case, we found ourselves in an interesting position – we have the expertise and context around the Linux audio stack to get this done. Our business model allows us the luxury of taking bites out of problems like this, and we’re happy to be able to do so.

However, it helps immensely when we do have funding to take on this work end-to-end – we can focus on the task entirely and get it done faster.

Onward…

I am delighted to announce that we were able to find the financial support to complete the PipeWire work! Once we land basic mono audio support in the MR above, we’ll move on to implementing stereo support in the BlueZ plugin and the PipeWire module. We’ll also be testing with some real-world devices, and we’ll be leaning on our community for more feedback.

This is an exciting development, and I’ll be writing more about it in a follow-up post in a few days. Stay tuned!

December 17, 2024

GStreamer + PipeWire: A Todo List

I wrote about our time at the GStreamer Conference in October, and one important thing I was able to do is spend some time with all-around great guy George reflecting on where the GStreamer plugins for PipeWire are, and what we need to do to get them to a rock-solid state.

This is a summary of our conversation, in the form of a to-do list of sorts…

Status Quo

Currently, we have two elements: pipewiresrc and pipewiresink. The two plugins work with both audio and video, and instantiate a PipeWire capture and playback stream, respectively. The stream, as with any PipeWire client, appears as a node in the PipeWire.

Buffers are managed in the GStreamer pipeline using bufferpools, and recently Wim re-enabled exposing the stream clock as a GStreamer clock.

There have been a number of issues that have cropped up over time, and we’ve been plugging away at addressing them, but it was worth stepping back and looking at the whole for a bit.

Use Cases

The straightforward uses of these elements might be to represent client streams: pipewiresrc might connect to an audio capture device (like a microphone), or video capture device (like a webcam), and provide the data for downstream elements to consume. Similarly pipewiresink might be used to play audio to the system output (speakers or headphones, perhaps).

Because of the flexibility of the PipeWire API, these elements may also be used to provide a virtual capture or playback device though. So pipewiresrc might provide a virtual audio sink, which applications could connect to to stream audio over the network (like say a WebRTC stream).

Conversely, it is possible to use pipewiresink to provide a virtual capture device – for example, the pipeline might generate a video stream and expose it a virtual camera for other applications to use.

We might even combine the two cases, one might connect to a webcam as a client, apply some custom video processing, and then expose that stream back as a virtual camera source as easily as:

pipewiresrc target-object="MyCamera" ! <some video filters> ! \
  pipewiresink provide=true stream-properties="props,media.class=Video/Source,media.role=Camera"

pipewiresrc target-object="MyCamera" ! <some video filters> ! \

pipewiresink provide=true stream-properties="props,media.class=Video/Source,media.role=Camera"

So we have a minor combinatorial explosion across 3 axes, and all combinations are valid:

pipewiresrc vs. pipewiresink
audio vs. video
stream vs. virtual device

For each of these combinations, we might have different behaviour across the various issues below.

Split ’em up?

Before we look at specific issues, it is worth pointing out that the PipeWire elements are unusual in that they support both audio and video with the same code. This seems like a tantalisingly elegant idea, and it’s quite neat that we are able to get this far with this unified approach.

However, as we examine the specific issues we are seeing, it does seem to emerge that the audio and video paths diverge in several ways. It may be time to consider whether the divergence merits just splitting them up into separate audio and video elements.

Linking

The first issue that comes to mind is how we might want PipeWire or WirePlumber to manage linking the nodes from the GStreamer pipeline with other nodes (devices or streams).

For the playback/capture stream use-cases, we would want the nodes to automatically be connected to a sink/source node when the GStreamer pipeline goes to the PAUSED or PLAYING state, and for that link to be torn down when leaving those states. It might be possible for the link to “move” if, for example, the default playback or capture device changes, though a “move” is really the removal of the current link with a new link following.

For the virtual device use-cases, the pipeline state should likely follow the link state. That is, when a node is connected to our virtual device, we want the GStreamer pipeline to start producing/consuming data, and when disconnected, it should go back to “sleep”, possibly running again later.

The latter is something that a GStreamer application using these plugins might have to manage manually, but simplifying this and supporting this via gst-launch-1.0 for easy command-line use would be nice to have.

There are already the beginnings of support for such usage via the provide property on pipewiresink, but more work is needed for this to make this truly usable.

Bufferpools

Closely related to linking are buffers and bufferpools, as the process of linking nodes is what makes buffers for data exchange available to PipeWire nodes.

While bufferpools are a valuable concept for memory efficiency and avoiding unnecessary memcpy()s, they come with some complexity overhead in managing the pipeline. For one, as the number of buffers in a bufferpool is limited, it is possible to exhaust the set of buffers (with a large queue for example).

There are also some lifecycle complexities that arise from links coming and going, as the corresponding buffers also then go away from under us, something that GStreamer bufferpools are not designed for.

A solution to the first problem might be to avoid using bufferpools for some cases (for example, they might not be very valuable for audio). The solution to the lifecycle problem is a trickier one, and no clear answer is apparent yet, at least with the APIs as they stand.

We might also need to support resizing bufferpools for some cases, and that is not something that is easy to support with how buffer management currently happens in PipeWire (the stream API does not really give us much of a handle on this).

Formats

In order to support the various use-cases, we want to be able to support both a fixed format (if we know what we are providing), or a negotiated format (if we can adapt in the GStreamer pipeline based on what PipeWire has/wants).

There is also a large surface area of formats that PipeWire supports that we need to make sure we support well:

There are known issues with some planar video formats being presented correctly from pipewiresrc
We do not expose planar audio formats, although both GStreamer and PipeWire support them
Support for DSD and passthrough audio (e.g. Dolby/DTS over HDMI) needs to be wired up
Support for compressed formats (we added PipeWire support for decode + render on a DSP)

Rate matching

While Wim recently added some rate matching code to pipewiresink, there is work to be done to make sure that if there is skew between the GStreamer pipeline’s data rate and the audio device rate, we can use PipeWire’s rate adaptation features to compensate for such skew. This should work in both pipewiresink and pipewiresrc.

For some background on this topic, check out my talk on clock rate matching from a couple of months ago.

Device provider conflicts

While we are improving the out-of-the-box experience of these elements, unfortunately the PipeWire device provider currently supersedes all others (the GStreamer Device Provider API allows for discovering devices on the system and the elements used to access them).

The higher rank might make sense for video (as system integrators likely want to start preferring PipeWire elements over V4L2), but it can lead to a bad experience for audio (the PulseAudio elements work better today via PipeWire’s PulseAudio emulation layer).

We might temporarily drop the rank of PipeWire elements for audio to avoid autoplugging them while we fix the problems we have.

Probing formats

We create a “probe” stream in pipewiresink while getting ready to play audio, in order to discover what formats the device we would play to supports. This is required to detect supported formats and make decisions about whether to decode in GStreamer, what sample rate and format are preferred, etc.

Unfortunately, that also causes a “false” playback device startup sequence, which might manifest as a click or glitch on some hardware. Having a way to set up a probe that does not actually open the device would be a nice improvement.

Player state

There are a couple of areas where policy actions do not always surface well to the application/UI layer. One instance of this is where a stream is “corked” (maybe because only one media player should be active at a time) – we want to let the player know it has been paused, so it can update its state and let the UI know too. There is limited infrastructure for this already, via GST_MESSAGE_REQUEST_STATE.

Also, more of a session management (i.e. WirePlumber / system integration) thing, we do not really have the concept of a system-wide media player state. This would be useful if we want to exercise policy like “don’t let any media player play while we’re on a call”, and have that state be consistent across UI interactions (i.e. hitting play during a call does not trigger playback, maybe even lets the user know why it’s not working / provides an override).

November 6, 2024

GStreamer Conference 2024

All of us at Asymptotic are back home from the exciting week at GStreamer Conference 2024 in Montréal, Canada last month. It was great to hang out with the community and see all the great work going on in the GStreamer ecosystem.

There were some visa-related adventures leading up to the conference, but thanks to the organising team (shoutout to Mark Filion and Tim-Philipp Müller), everything was sorted out in time and Sanchayan and Taruntej were able to make it.

This conference was also special because this year marks the 25th anniversary of the GStreamer project!

Talks

We had 4 talks at the conference this year.

GStreamer & QUIC (video)

Sancyahan speaking about GStreamer and QUIC

Sanchayan spoke about his work with the various QUIC elements in GStreamer. We already have the quinnquicsrc and quinquicsink upstream, with a couple of plugins to allow (de)multiplexing of raw streams as well as an implementation or RTP-over-QUIC (RoQ). We’ve also started work on Media-over-QUIC (MoQ) elements.

This has been a fun challenge for us, as we’re looking to build out a general-purpose toolkit for building QUIC application-layer protocols in GStreamer. Watch this space for more updates as we build out more functionality, especially around MoQ.

Clock Rate Matching in GStreamer & PipeWire (video)

Arun speaking about PipeWire delay-locked loops — Photo credit: Francisco

My talk was about an interesting corner of GStreamer, namely clock rate matching. This is a part of live pipelines that is often taken for granted, so I wanted to give folks a peek under the hood.

The idea of doing this talk was was born out of some recent work we did to allow splitting up the graph clock in PipeWire from the PTP clock when sending AES67 streams on the network. I found the contrast between the PipeWire and GStreamer approaches thought-provoking, and wanted to share that with the community.

GStreamer for Real-Time Audio on Windows (video)

Next, Taruntej dove into how we optimised our usage of GStreamer in a real-time audio application on Windows. We had some pretty tight performance requirements for this project, and Taruntej spent a lot of time profiling and tuning the pipeline to meet them. He shared some of the lessons learned and the tools he used to get there.

Simplifying HLS playlist generation in GStreamer (video)

Sanchayan also walked us through the work he’s been doing to simplify HLS (HTTP Live Streaming) multivariant playlist generation. This should be a nice feature to round out GStreamer’s already strong support for generating HLS streams. We are also exploring the possibility of reusing the same code for generating DASH (Dynamic Adaptive Streaming over HTTP) manifests.

Hackfest

As usual, the conference was followed by a two-day hackfest. We worked on a few interesting problems:

Sanchayan addressed some feedback on the QUIC muxer elements, and then investigated extending the HLS elements for SCTE-35 marker insertion and DASH support
Taruntej worked on improvements to the threadshare elements, specifically to bring some ts-udpsrc element features in line with udpsrc
I spent some time reviewing a long-pending merge request to add soft-seeking support to the AWS S3 sink (so that it might be possible to upload seekable MP4s, for example, directly to S3). I also had a very productive conversation with George Kiagiadakis about how we should improve the PipeWire GStreamer elements (more on this soon!)

All in all, it was a great time, and I’m looking forward to the spring hackfest and conference in the the latter part next year!

September 4, 2024

GStreamer and WebRTC HTTP signalling

The WebRTC nerds among us will remember the first thing we learn about WebRTC, which is that it is a specification for peer-to-peer communication of media and data, but it does not specify how signalling is done.

Or put more simply, if you want call someone on the web, WebRTC tells you how you can transfer audio, video and data, but it leaves out the bit about how you make the call itself: how do you locate the person you’re calling, let them know you’d like to call them, and a few following steps before you can see and talk to each other.

While this allows services to provide their own mechanisms to manage how WebRTC calls work, the lack of a standard mechanism means that general-purpose applications need to individually integrate each service that they want to support. For example, GStreamer’s webrtcsrc and webrtcsink elements support various signalling protocols, including Janus Video Rooms, LiveKit, and Amazon Kinesis Video Streams.

However, having a standard way for clients to do signalling would help developers focus on their application and worry less about interoperability with different services.

Standardising Signalling

With this motivation, the IETF WebRTC Ingest Signalling over HTTPS (WISH) workgroup has been working on two specifications:

WebRTC-HTTP Ingestion protocol (WHIP)
WebRTC-HTTP Egress Protocol (WHEP)

(author’s note: the puns really do write themselves :))

As the names suggest, the specifications provide a way to perform signalling using HTTP. WHIP gives us a way to send media to a server, to ingest into a WebRTC call or live stream, for example.

Conversely, WHEP gives us a way for a client to use HTTP signalling to consume a WebRTC stream – for example to create a simple web-based consumer of a WebRTC call, or tap into a live streaming pipeline.

With this view of the world, WHIP and WHEP can be used both for calling applications, but also as an alternative way to ingest or play back live streams, with lower latency and a near-ubiquitous real-time communication API.

In fact, several services already support this including Dolby Millicast, LiveKit and Cloudflare Stream.

WHIP and WHEP with GStreamer

We know GStreamer already provides developers two ways to work with WebRTC streams:

webrtcbin: provides a low-level API, akin to the PeerConnection API that browser-based users of WebRTC will be familiar with
webrtcsrc and webrtcsink: provide high-level elements that can respectively produce/consume media from/to a WebRTC endpoint

At Asymptotic, my colleagues Tarun and Sanchayan have been using these building blocks to implement GStreamer elements for both the WHIP and WHEP specifications. You can find these in the GStreamer Rust plugins repository.

Our initial implementations were based on webrtcbin, but have since been moved over to the higher-level APIs to reuse common functionality (such as automatic encoding/decoding and congestion control). Tarun covered our work in a talk at last year’s GStreamer Conference.

Today, we have 4 elements implementing WHIP and WHEP.

Clients

whipclientsink: This is a webrtcsink-based implementation of a WHIP client, using which you can send media to a WHIP server. For example, streaming your camera to a WHIP server is as simple as:

gst-launch-1.0 -e \
  v4l2src ! video/x-raw ! queue ! \
  whipclientsink signaller::whip-endpoint="https://my.webrtc/whip/room1"

gst-launch-1.0 -e \

v4l2src ! video/x-raw ! queue ! \

whipclientsink signaller::whip-endpoint="https://my.webrtc/whip/room1"

whepclientsrc: This is work in progress and allows us to build player applications to connect to a WHEP server and consume media from it. The goal is to make playing a WHEP stream as simple as:

gst-launch-1.0 -e \
  whepclientsrc signaller:whep-endpoint="https://my.webrtc/whep/room1" ! \
  decodebin ! autovideosink

gst-launch-1.0 -e \

whepclientsrc signaller:whep-endpoint="https://my.webrtc/whep/room1" ! \

decodebin ! autovideosink

The client elements fit quite neatly into how we might imagine GStreamer-based clients could work. You could stream arbitrary stored or live media to a WHIP server, and play back any media a WHEP server provides. Both pipelines implicitly benefit from GStreamer’s ability to use hardware-acceleration capabilities of the platform they are running on.

Servers

whipserversrc: Allows us to create a WHIP server to which clients can connect and provide media, each of which will be exposed as GStreamer pads that can be arbitrarily routed and combined as required. We have an example server that can play all the streams being sent to it.
whepserversink: Finally we have ongoing work to publish arbitrary streams over WHEP for web-based clients to consume this media.

The two server elements open up a number of interesting possibilities. We can ingest arbitrary media with WHIP, and then decode and process, or forward it, depending on what the application requires. We expect that the server API will grow over time, based on the different kinds of use-cases we wish to support.

This is all pretty exciting, as we have all the pieces to create flexible pipelines for routing media between WebRTC-based endpoints without having to worry about service-specific signalling.

If you’re looking for help realising WHIP/WHEP based endpoints, or other media streaming pipelines, don’t hesitate to reach out to us!

March 19, 2024

Asymptotic: A 2023 Review

It’s been a busy ~~few~~ several months, but now that we have some breathing room, I wanted to take stock of what we have done over the last year or so.

This is a good thing for most people and companies to do of course, but being a scrappy, (questionably) young organisation, it’s doubly important for us to introspect. This allows us to both recognise our achievements and ensure that we are accomplishing what we have set out to do.

One thing that is clear to me is that we have been lagging in writing about some of the interesting things that we have had the opportunity to work on, so you can expect to see some more posts expanding on what you find below, as well as some of the newer work that we have begun.

(note: I write about our open source contributions below, but needless to say, none of it is possible without the collaboration, input, and reviews of members of the community)

WHIP/WHEP client and server for GStreamer

If you’re in the WebRTC world, you likely have not missed the excitement around standardisation of HTTP-based signalling protocols, culminating in the WHIP and WHEP specifications.

Tarun has been driving our client and server implementations for both these protocols, and in the process has been refactoring some of the webrtcsink and webrtcsrc code to make it easier to add more signaller implementations. You can find out more about this work in his talk at GstConf 2023 and we’ll be writing more about the ongoing effort here as well.

Low-latency embedded audio with PipeWire

Some of our work involves implementing a framework for very low-latency audio processing on an embedded device. PipeWire is a good fit for this sort of application, but we have had to implement a couple of features to make it work.

It turns out that doing timer-based scheduling can be more CPU intensive than ALSA period interrupts at low latencies, so we implemented an IRQ-based scheduling mode for PipeWire. This is now used by default when a pro-audio profile is selected for an ALSA device.

In addition to this, we also implemented rate adaptation for USB gadget devices using the USB Audio Class “feedback control” mechanism. This allows USB gadget devices to adapt their playback/capture rates to the graph’s rate without having to perform resampling on the device, saving valuable CPU and latency.

There is likely still some room to optimise things, so expect to more hear on this front soon.

Compress offload in PipeWire

Sanchayan has written about the work we did to add support in PipeWire for offloading compressed audio. This is something we explored in PulseAudio (there’s even an implementation out there), but it’s a testament to the PipeWire design that we were able to get this done without any protocol changes.

This should be useful in various embedded devices that have both the hardware and firmware to make use of this power-saving feature.

GStreamer LC3 encoder and decoder

Tarun wrote a GStreamer plugin implementing the LC3 codec using the liblc3 library. This is the primary codec for next-generation wireless audio devices implementing the Bluetooth LE Audio specification. The plugin is upstream and can be used to encode and decode LC3 data already, but will likely be more useful when the existing Bluetooth plugins to talk to Bluetooth devices get LE audio support.

QUIC plugins for GStreamer

Sanchayan implemented a QUIC source and sink plugin in Rust, allowing us to start experimenting with the next generation of network transports. For the curious, the plugins sit on top of the Quinn implementation of the QUIC protocol.

There is a merge request open that should land soon, and we’re already seeing folks using these plugins.

AWS S3 plugins

We’ve been fleshing out the AWS S3 plugins over the years, and we’ve added a new awss3putobjectsink. This provides a better way to push small or sparse data to S3 (subtitles, for example), without potentially losing data in case of a pipeline crash.

We’ll also be expecting this to look a little more like multifilesink, allowing us to arbitrary split up data and write to S3 directly as multiple objects.

Update to webrtc-audio-processing

We also updated the webrtc-audio-processing library, based on more recent upstream libwebrtc. This is one of those things that becomes surprisingly hard as you get into it — packaging an API-unstable library correctly, while supporting a plethora of operating system and architecture combinations.

Clients

We can’t always speak publicly of the work we are doing with our clients, but there have been a few interesting developments we can (and have spoken about).

Both Sanchayan and I spoke a bit about our work with WebRTC-as-a-service provider, Daily. My talk at the GStreamer Conference was a summary of the work I wrote about previously about what we learned while building Daily’s live streaming, recording, and other backend services. There were other clients we worked with during the year with similar experiences.

Sanchayan spoke about the interesting approach to building SIP support that we took for Daily. This was a pretty fun project, allowing us to build a modern server-side SIP client with GStreamer and SIP.js.

An ongoing project we are working on is building AES67 support using GStreamer for FreeSWITCH, which essentially allows bridging low-latency network audio equipment with existing SIP and related infrastructure.

As you might have noticed from previous sections, we are also working on a low-latency audio appliance using PipeWire.

Retrospective

All in all, we’ve had a reasonably productive 2023. There are things I know we can do better in our upstream efforts to help move merge requests and issues, and I hope to address this in 2024.

We have ideas for larger projects that we would like to take on. Some of these we might be able to find clients who would be willing to pay for. For the ideas that we think are useful but may not find any funding, we will continue to spend our spare time to push forward.

If you made this this far, thank you, and look out for more updates!

August 18, 2023

To Conference Organisers Everywhere…

(well, not exactly everywhere …)

This is not an easy post for me to write, being a bit of a criticism / “you can do better” note for organisers of conferences that cater to a global community.

It’s not easy because most of the conferences I attend are community driven, and I have helped organise community conferences in the past. It is a thankless job, a labour of love, and you mostly do not get to enjoy the fruits of that labour as others do.

The problem is that these conferences end up inadvertently excluding members who live in, for lack of a better phrase, the Global South.

Visas

It always surprises me when I meet someone who doesn’t realise that I can’t just book tickets to go anywhere in the world. Not because this is information that everyone should be aware of, but because this is such a basic aspect of travel for someone like me. As a holder of an Indian passport, I need to apply for a visa to travel to … well most countries.

The list of countries that require a visa are clearly defined by post-colonial geopolitics, this is just a fact of life and not something I can do very much about.

Getting a Visa

Applying for a visa is a cumbersome and intrusive process that I am now used to. The process varies from country to country, but it’s usually something like:

Get an invitation letter from conference organisers
Book all the tickets and accommodation for the trip
Provide bank statements for 3-6 months, income tax returns for ~3 years (in India, those statements need attestation by the bank)
Maybe provide travel and employment history for the past few years (how many years depends on the country)
Get an appointment from the embassy of the country you’re traveling to (or their service provider)
Submit your passport and application
Maybe provide documentation that was not listed on the provider’s website
Wait for your passport with visa (if granted) to be mailed back to you

The duration of visa (that is how long you can stay in the country) depends on the country.

In the EU, I am usually granted a visa for the exact dates of travel (so there is no flexibility to change plans). The UK allows you to pay more for a longer visa.

The US and Canada grant multi-year visas that allow one to visit for up to 6 months by default (in the US, whether you are permitted to enter and how long you may stay are determined by the person at the border).

Timelines

Now we get to the crux of the problem: this process can take anywhere from a few days (if you are very lucky) to a few months (if you are not).

Appointments are granted by the embassy or the third party that countries delegate application collection to, and these may or may not be easily available. Post-pandemic, I’ve seen that several embassies just aren’t accepting visitor visa appointments or have a multi-month wait.

If you do get an appointment, the processing time can vary again. Sometimes, it’s a matter of a few days, sometimes a few weeks. A lot of countries I have applied to recommend submitting your application at least 6 weeks in advance (this is from the date of your visa appointment which might be several weeks in the future).

Conference Schedules

If you’re organising a conference, there are a few important dates:

When the conference dates are announced
When the call for participation goes out
When it ends
When speakers are notified
The conference itself

These dates are based on a set of complex factors — venue availability and confirmation, literally writing and publishing all the content of the website, paper committee availability, etc.

But if you’re in my position, you need at least 2-3 months between the first and the last step. If your attendance is conditional on speaking at the conference (for example, if your company will only sponsor you if you’re speaking), then you need a minimum of 2-3 months between when speakers are notified and the conference starts.

From what I see, this is not something that is top-of-mind for conference organisers. That may happen for a host of perfectly understandable reasons, but it also has a cost to the community and individuals who might want to participate.

Other Costs

Applying for a visa costs money. This can be anything from a few hundred to over a 1000 US dollars.

It also costs you time — filling in the application, getting all the documentation in place, getting a physical visa photo (must be no older than 6 months), traveling to an appointment, waiting in line, etc. This can easily be a matter of a day if not more.

Finally, there is an emotional cost to all this — there is constant uncertainty during the process, and a visa rejection means every visa you apply for thereafter needs you to document that rejection and reason. And you may find out just days before your planned travel whether you get to travel or not.

What Can One Do?

All of this clearly sucks, but the problem of visas is too big and messy for any of us to have any real impact on, at least in the short term. But if you’re organising a conference, and you want a diverse audience, here are a few things you can do:

Announce the dates of the conference as early as possible (allows participants to book travel, visa appointments, maybe club multiple conferences)
Provide invitation letters in a timely manner
Call for participation as early as possible
Notify speakers as soon as you can

I know of conferences that do some if not all of these things — you know who you are and you have my gratitude for it.

If you made it this far, thank you for reading.

August 24, 2022

GStreamer for your backend services

For the last year and a half, we at Asymptotic have been working with the excellent team at Daily. I’d like to share a little bit about what we’ve learned.

Daily is a real time calling platform as a service. One standard feature that users have come to expect in their calls is the ability to record them, or to stream their conversations to a larger audience. This involves mixing together all the audio/video from each participant and then storing it, or streaming it live via YouTube, Twitch, or any other third-party service.

As you might expect, GStreamer is a good fit for building this kind of functionality, where we consume a bunch of RTP streams, composite/mix them, and then send them out to one or more external services (Amazon’s S3 for recordings and HLS, or a third-party RTMP server).

I’ve written about how we implemented this feature elsewhere, but I’ll summarise briefly.

This is a slightly longer post than usual, so grab a cup of your favourite beverage, or jump straight to the summary section for the tl;dr.

October 31, 2018

Update from the PipeWire hackfest

As the third and final day of the PipeWire hackfest draws to a close, I thought I’d summarise some of my thoughts on the goings-on and the future.

Thanks

Before I get into the details, I want to send out a big thank you to:

Christian Schaller for all the hard work of organising the event and Wim Taymans for the work on PipeWire so far (and in the future)
The GNOME Foundation, for sponsoring the event as a whole
Qualcomm, who are funding my presence at the event
Collabora, for sponsoring dinner on Monday
Everybody who attended and participate for their time and thoughtful comments

Background

For those of you who are not familiar with it, PipeWire (previously Pinos, previously PulseVideo) was Wim’s effort at providing secure, multi-program access to video devices (like webcams, or the desktop for screen capture). As he went down that rabbit hole, he wrote SPA, a lightweight general-purpose framework for representing a streaming graph, and this led to the idea of expanding the project to include support for low latency audio.

The Linux userspace audio story has, for the longest time, consisted of two top-level components: PulseAudio which handles consumer audio (power efficiency, wide range of arbitrary hardware), and JACK which deals with pro audio (low latency, high performance). Consolidating this into a good out-of-the-box experience for all use-cases has been a long-standing goal for myself and others in the community that I have spoken to.

An Opportunity

From a PulseAudio perspective, it has been hard to achieve the 1-to-few millisecond latency numbers that would be absolutely necessary for professional audio use-cases. A lot of work has gone into improving this situation, most recently with David Henningsson’s shared-ringbuffer channels that made client/server communication more efficient.

At the same time, as application sandboxing frameworks such as Flatpak have added security requirements of us that were not accounted for when PulseAudio was written. Examples including choosing which devices an application has access to (or can even know of) or which applications can act as control entities (set routing etc., enable/disable devices). Some work has gone into this — Ahmed Darwish did some key work to get memfd support in PulseAudio, and Wim has prototyped an access-control mechanism module to enable a Flatpak portal for sound.

All this said, there are still fundamental limitations in architectural decisions in PulseAudio that would require significant plumbing to address. With Wim’s work on PipeWire and his extensive background with GStreamer and PulseAudio itself, I think we have an opportunity to revisit some of those decisions with the benefit of a decade’s worth of learning deploying PulseAudio in various domains starting from desktops/laptops to phones, cars, robots, home audio, telephony systems and a lot more.

Key Ideas

There are some core ideas of PipeWire that I am quite excited about.

The first of these is the graph. Like JACK, the entities that participate in the data flow are represented by PipeWire as nodes in a graph, and routing between nodes is very flexible — you can route applications to playback devices and capture devices to applications, but you can also route applications to other applications, and this is notionally the same thing.

The second idea is a bit more radical — PipeWire itself only “runs” the graph. The actual connections between nodes are created and managed by a “session manager”. This allows us to completely separate the data flow from policy, which means we could write completely separate policy for desktop use cases vs. specific embedded use cases. I’m particularly excited to see this be scriptable in a higher-level language, which is something Bastien has already started work on!

A powerful idea in PulseAudio was rewinding — the ability to send out huge buffers to the device, but the flexibility to rewind that data when things changed (a new stream got added, or the stream moved, or the volume changed). While this is great for power saving, it is a significant amount of complexity in the code. In addition, with some filters in the data path, rewinding can break the algorithm by introducing non-linearity. PipeWire doesn’t support rewinds, and we will need to find a good way to manage latencies to account for low power use cases. One example is that we could have the session manager bump up the device latency when we know latency doesn’t matter (Android does this when the screen is off).

There are a bunch of other things that are in the process of being fleshed out, like being able to represent the hardware as a graph as well, to have a clearer idea of what is going on within a node. More updates as these things are more concrete.

The Way Forward

There is a good summary by Christian about our discussion about what is missing and how we can go about trying to make a smooth transition for PulseAudio users. There is, of course, a lot to do, and my ideal outcome is that we one day flip a switch and nobody knows that we have done so.

In practice, we’ll need to figure out how to make this transition seamless for most people, while folks with custom setup will need to be given a long runway and clear documentation to know what to do. It’s way to early to talk about this in more specifics, however.

Configuration

One key thing that PulseAudio does right (I know there are people who disagree!) is having a custom configuration that automagically works on a lot of Intel HDA-based systems. We’ve been wondering how to deal with this in PipeWire, and the path we think makes sense is to transition to ALSA UCM configuration. This is not as flexible as we need it to be, but I’d like to extend it for that purpose if possible. This would ideally also help consolidate the various methods of configuration being used by the various Linux userspaces.

To that end, I’ve started trying to get a UCM setup on my desktop that PulseAudio can use, and be functionally equivalent to what we do with our existing configuration. There are missing bits and bobs, and I’m currently focusing on the ones related to hardware volume control. I’ll write about this in the future as the effort expands out to other hardware.

Onwards and upwards

The transition to PipeWire is unlikely to be quick or completely-painless or free of contention. For those who are worried about the future, know that any switch is still a long way away. In the mean time, however, constructive feedback and comments are welcome.

Older Posts