AudioServer to have function to access microphone buffer directly by goatchurchprime · Pull Request #113288 · godotengine/godot

goatchurchprime · 2025-11-28T18:01:19Z

Direct access to the audio data in the microphone buffer is necessary to overcome the design flaw of connecting the Audio Input stream (microphone) to the Audio Output stream under the mistaken assumption that they will always proceed at exactly the same rate for all time on every platform.

A diagram of the current design and the improved situation can be seen here.

The following issues are illustrative of the flaw in the current design.

Issue Microphone Garbling #112836 (which has been replicated on 3 platforms, not only Android)
Issue Microphone builds up delay over time #80173
Issue Microphone plays its buffer in an infinite loop on Android when re-enabled #95120
Issue Audio input gets muted after a while on android #86428 -- which works by packing zero values into the audio stream when the buffer under-runs, causing a degradation of the sound quality (such as Garbling)
(Observed Godot3) Issue AudioEffectRecord has more latency when using an USB microphone compared to an analog microphone #54544

This PR is a re-implementation of three previous attempts to find an appropriate place for a get_microphone_frames() function:

This also resolves godotengine/godot-proposals#11347 (proposal).

A comprehensive demo is at:
https://github.com/goatchurchprime/godot-demo-projects/tree/gtch/mic_input/audio/mic_input

This PR adds four new functions to AudioServer:

set_input_device_active(active : bool) -> int
get_input_buffer_length_frames() -> int
get_input_frames_available() -> int
get_input_frames(frames : int) -> PackedVector2Array

In the class AudioEffectCapture, the function that is equivalent to get_input_frames() is known as get_buffer().

The function get_buffer_length_frames() is necessary to predict when the overflow is likely to occur.

The function set_input_device_active() provides direct control of exactly when the microphone stream is turned on and off, rather than it currently being done by the existence of an active AudioStreamMicrophone playback object.

This could mean that we never have to call AudioServer.set_input_device() while the input device is active, which leads to problematic situations where the error case of a failure to start the new device cannot be properly handled. Instead you would call set_input_device_active(false), then change the device, before calling set_input_device_active(true) which can return an error if it doesn't work.

Stuff to do

~~Review the function names and whether it needs get_input_device_active() and member variable input_device_active and functions to be const.~~
~~Write the docs~~
~~Make AudioStreamMicrophonePlayback use the input_device_active interface, and check it still works as before (if none of these new functions are used).~~

Ivorforce

Looks like a great start! This is about the amount of complexity I was hoping to see :)

servers/audio/audio_server.cpp

servers/audio/audio_server.h

servers/audio/audio_server.cpp

platform/android/audio_driver_opensl.cpp

Ivorforce

I think this might be ready to merge (though I want to test before approving). Any reason it's still in draft?

servers/audio/audio_server.cpp

platform/android/audio_driver_opensl.cpp

Ivorforce

I think this PR is as ready to merge as it will ever get. Great job!
I'm not in the audio team, so it would be nice to get confirmation on its mergability as well. I'm hoping to fit it into 4.6, since it's a pretty riskless, potentially high impact change.

Since this PR has quite some history, I want to lay out my reasons for endorsing this form of the feature.
Full disclosure, I have been part of a few audio meetings discussing this, and further had dedicated meetings with @goatchurchprime discussing this feature in detail. This PR is the outcome of these discussions.

First off, why is this feature needed?
To cut it short, the current way to accept microphone data is inherently flawed. It is done through AudioEffectCapture, after it has experienced a lot of travel through the audio systems.
The main problem with this is that audio input is accepted asynchronously to the audio server's clock. Even when they have the same theoretical sampling rate, they inevitably drift apart. This can lead to either too many samples being produced to consume, or too few. In our codebase, this leads to audio stutters and gaps (previously, it led to capture ending abruptly), making audio capturing like this problematic. A way to fix it in place might be to resample on the fly, but this is expensive and difficult, and more importantly, it is not needed.

The above example shows that handling input data on the audio system designed for playback is inherently error prone. On the bright side, most games do not need to play back the audio. They just need to capture it, and send it elsewhere, also known as voice over IP (or, use it as an input, ala One Hand Clapping). So a system is needed to capture the audio before it enters the server at an incorrect sampling rate.

The three previous PRs were various attempts at this.
The first PR injected an alternative usage path into AudioStreamPlaybackMicrophone, which is the class that handles the microphone input for playback in Godot's audio playback system. I can see the reasoning, but in my opinion it makes the class ambiguous, and adds unnecessary complexity to it.

The second PR was created after maintainer feedback which suggested the perspective to handle the microphone input more like other control inputs, like mouse, keyboard and controllers, and add it to the Input class. That PR ended up similar to this PR. It's not bad, but the kind of data microphones generate is very different to the data Input usually deals with For that reason, I believe AudioServer is a more suitable class to working with it appropriately.

The third PR was created after feedback that the system proposed in Input could not deal with multiple microphones at once, and was therefore underdesigned.
However, I believe this feedback was misguided. By our contribution guidelines, we should strive to only solve problems that people actually have. Using multiple microphones at once is not a feature that has been requested by users. Even dedicated audio software does not support capturing input from multiple audio inputs simultaneously a lot of the time, probably because it's so rarely needed in actual projects. @goatchurchprime's PR indulging the feedback, and adding a microphone server and feeds, ended up way more complicated than this one, with multiple new classes exposed, and functionality skeletons added for functions that are unlikely to be needed by anyone anytime soon. An alternative implementation by @adamscott attempted to implement the feature wholesale, but ended up even more complicated. In contrast to this PR, which adds and exposes barely any features, and solves the microphone needs of most games perfectly, I think this clearly shows why we should add demand-driven infrastructure, rather than infrastructure designed for a potential future case that is unlikely to be needed by most Godot engine users. (on that note, the addition of AudioStreamPlaybackMicrophone idea was probably premature in the first place, since it's unlikely many games want to play back the player's voice to them from their own speaker).
Even if people run into limitations with the system added here, we can deprecate this system and add a better one that solves whatever problem users run into at that time. Since this is an extremely small solution, it's easy to sculpt or remove completely when it outlives its designed purpose.

All in all, I believe this PR is the appropriate solution for the problem at hand.
The PR is low risk, especially with @goatchurchprime's rigorous testing with his bespoke tool.
It is also potentially high-impact, since for the first time in Godot's history, it will be possible to have stable voice-over-ip technology in Godot made games. A demand for this is evident by the popularity of @goatchurchprime's extension two-voip, which even in the current state of the godot interface is attracting users.

Due to the reasons given above, I'd like to restate my initial comment that I think this should be merged into 4.6.

doc/classes/AudioServer.xml

goatchurchprime · 2025-12-01T13:31:32Z

I just want to add that there are a number of users of my twovoip plugin who are waiting on this issue (they've submitted tickets about the sound quality on my repo that are actually due to this Godot problem), and who would probably begin using this feature as soon as it's on a development release.

Some have already tested that it's good with an earlier PR, but not everyone is set up for compiling PRs very easy -- especially the export templates.

The important point is that I expect this is going to get tested by users on different pieces of hardware right away, so we're going to find out about any problems I've not anticipated very quickly, and I am committed to getting on top of them asap.

lyuma

The code looks good. the only risk is this api is designed for single-consumer since it has only one global offset variable, and modifies with the same start/stop state as AudioStreamMicrophone.

Some day, I would prefer a higher level object oriented API which would also be used by AudioStreamMicrophone, that holds the offset internally and and can increment/decrement a "active counter" that will control start / stop so you can have two consumers of the audio data without interference. For the object oriented idea, we'd need a Vector version of the get_input_frames function (which AudioStreamPlaybackMicrophone::_mix_internal would call).

As for my review, the code should do what it says so I'll approve.

I consider this to be a low level API which ideally users wouldn't use directly. If there was more time I'd really prefer an object oriented api that will have exactly the same performance but avoids the single global offset variable stored in AudioServer which limits our ability to improve things in the future.

Ivorforce · 2025-12-02T12:58:04Z

The code looks good. the only risk is this api is designed for single-consumer since it has only one global offset variable, and modifies with the same start/stop state as AudioStreamMicrophone.

Right, this was also my primary reservation.
However, after some consideration, I think we don't need to address this concern now, at least not until people actually raise it as a limitation. I think the vast majority of games will only need a single consumer of this microphone data. Especially, AudioStreamMicrophone will probably not be used much at all when this API is available, because playing the player's voice back to them from their own speakers is not very useful for most games.

Some day, I would prefer a higher level object oriented API which would also be used by AudioStreamMicrophone, that holds the offset internally and and can increment/decrement a "active counter" that will control start / stop so you can have two consumers of the audio data without interference. For the object oriented idea, we'd need a Vector version of the get_input_frames function (which AudioStreamPlaybackMicrophone::_mix_internal would call).
I consider this to be a low level API which ideally users wouldn't use directly. If there was more time I'd really prefer an object oriented api that will have exactly the same performance but avoids the single global offset variable stored in AudioServer which limits our ability to improve things in the future.

I also think it's likely we'll have some more comprehensive API eventually, and this code will be deprecated.
However, as mentioned above, I don't think we're at a state yet where we can confidently design this API. If we were to design it now, it would probably look like #108773, and add a ton of complexity most people don't need.
As players will start using the microphone input for voip or game controls, I'm sure they'll raise the limitations they'll run into, and we can design the future API around those concerns.

allenwp · 2025-12-02T14:05:51Z

I also think it's likely we'll have some more comprehensive API eventually, and this code will be deprecated.

With multiple maintainers expecting this to be deprecated in the future: should we mark the API as experimental? This way users can expect to need to update their code to a new API in the future (and if they don’t need to, then that’s just a bonus to them).

adamscott

There's a few changes to make, but I'll give my approval anyway right now.

This saga has been long time coming.

First of all, I tip my hat to @goatchurchprime's tenacity and to @Ivorforce for taking this over.

If it was just by me, I wouldn't merge this PR to be honest. While the code is well made, it exposes a buffer that wasn't made to be exposed. All of this from an API that was built intending audio input to be used as an audio device (instead of as data entry).

We need a refactor. I created a branch that implements a prototype of a MicrophoneServer. While still flawed, I was able to record from two distinct inputs at a time. While games may not need multiple entries, we must still understand that Godot is used more and more as a general purpose engine. I showed @goatchurchprime that Web apps like Jitsi use multiple inputs to display VU meters of the available input devices to facilitate selection:

It's also more in line with the current Godot way of dealing with input devices. See CameraServer for more details. (even if I found multiple flaws of CameraServer while experimenting on my MicrophoneServer)

But a refactor is costly and takes time. And people needing some features cannot wait.

The arguments made by @Ivorforce are convincing. Even if I'm not fully convinced, I'm enough to give this PR a go.

doc/classes/AudioServer.xml

drivers/pulseaudio/audio_driver_pulseaudio.cpp

servers/audio/audio_server.cpp

adamscott · 2025-12-03T12:54:03Z

platform/android/audio_driver_opensl.cpp

-	if (!recordItf || !recordBufferQueueItf) {
-		return ERR_CANT_OPEN;
+	if (recordItf) {
+		(*recordItf)->SetRecordState(recordItf, SL_RECORDSTATE_STOPPED);
+		recordItf = nullptr;
 	}

-	SLuint32 state;
-	SLresult res = (*recordItf)->GetRecordState(recordItf, &state);
-	ERR_FAIL_COND_V(res != SL_RESULT_SUCCESS, ERR_CANT_OPEN);
-
-	if (state != SL_RECORDSTATE_STOPPED) {
-		res = (*recordItf)->SetRecordState(recordItf, SL_RECORDSTATE_STOPPED);
-		ERR_FAIL_COND_V(res != SL_RESULT_SUCCESS, ERR_CANT_OPEN);
-
-		res = (*recordBufferQueueItf)->Clear(recordBufferQueueItf);
-		ERR_FAIL_COND_V(res != SL_RESULT_SUCCESS, ERR_CANT_OPEN);
+	if (recordBufferQueueItf) {
+		(*recordBufferQueueItf)->Clear(recordBufferQueueItf);
+		recordBufferQueueItf = nullptr;
+	}
+	if (recorder) {
+		(*recorder)->Destroy(recorder);
+		recorder = nullptr;
 	}

 	return OK;


Why did the errors checks were removed? I understand that we want to return OK even if the driver wasn't recording, but removing every error check doesn't help either.

It is not useful for input_stop() to have an error condition, because you couldn't assume anything (is the input still running?).

I've written this function to clean up the incomplete states init_input_device() could leave the classes in with its heavy use of the ERR_FAIL_COND_V macro to quit early.

For these three specific functions the docs say:

The only possible error for SetRecordState() is SL_RESULT_PARAMETER_INVALID, which we can rule out.

SLBufferQueueItf::Clear() has no error case

Destroy() has no return value.

It would be possible to collect the errors and return them at the end, after making sure as much as possible is cleaned up.
Anyway, I agree with Julian that for now trying to clean up as much as possible is preferable to exiting early and leaving behind a broken state. We can try to make this even more robust later. Edit: Or not, if there's no error state to begin with!

servers/audio/audio_server.cpp

allenwp · 2025-12-03T15:50:02Z

I’m going to go out on a limb and speak on behalf of @goatchurchprime, who may not be available to respond quickly today:

I suspect that any change requests can be applied, including reintroducing error checking, and these can be applied directly to this branch, squashed, and merged to get this in before feature freeze. Marking it as experimental makes this acceptable to me.

Let me know if you need me to come in and do this; I may have enough knowledge to do so without risk.

Ivorforce · 2025-12-03T15:59:32Z

I suspect that any change requests can be applied, including reintroducing error checking, and these can be applied directly to this branch, squashed, and merged to get this in before feature freeze. Marking it as experimental makes this acceptable to me.

Agreed.

Let me know if you need me to come in and do this; I may have enough knowledge to do so without risk.

Yea, that would be appreciated. It doesn't look like @goatchurchprime is online today.
Edit: Seems like we were wrong :)

goatchurchprime · 2025-12-03T16:21:41Z

I'm leery of making any code changes when I don't have time to retest them properly (typos happen).

If this gets merged there will be a reasonable amount of testing across all platforms very soon, so we can count on there something needing to be fixed early on (if there is a bug, which there almost always is), and that would be a better time to add these changes in.

allenwp · 2025-12-03T16:25:11Z

OK, I’ll stay out of the way!

…le starts

Repiteo · 2025-12-03T22:56:31Z

Thanks!

goatchurchprime mentioned this pull request Nov 28, 2025

Very choppy/bad/unintelligible audio from a particular machine goatchurchprime/two-voip-godot-4#66

Closed

Ivorforce added enhancement topic:audio labels Nov 28, 2025

Ivorforce added this to the 4.x milestone Nov 28, 2025

Ivorforce reviewed Nov 28, 2025

View reviewed changes

Ivorforce reviewed Nov 30, 2025

View reviewed changes

servers/audio/audio_server.cpp Outdated Show resolved Hide resolved

platform/android/audio_driver_opensl.cpp Show resolved Hide resolved

goatchurchprime marked this pull request as ready for review November 30, 2025 11:57

goatchurchprime requested review from a team as code owners November 30, 2025 11:57

goatchurchprime force-pushed the gtch/audioservermic branch from e9e4d3f to 0bc133a Compare November 30, 2025 17:26

Ivorforce approved these changes Nov 30, 2025

View reviewed changes

Ivorforce modified the milestones: 4.x, 4.6 Nov 30, 2025

AThousandShips reviewed Dec 1, 2025

View reviewed changes

doc/classes/AudioServer.xml Outdated Show resolved Hide resolved

doc/classes/AudioServer.xml Outdated Show resolved Hide resolved

doc/classes/AudioServer.xml Outdated Show resolved Hide resolved

goatchurchprime force-pushed the gtch/audioservermic branch from 5f60c23 to 68a73aa Compare December 1, 2025 13:15

lyuma approved these changes Dec 2, 2025

View reviewed changes

akien-mga requested a review from adamscott December 2, 2025 13:13

zodiepupper approved these changes Dec 2, 2025

View reviewed changes

adamscott approved these changes Dec 3, 2025

View reviewed changes

Access microphone buffer from AudioServer and prevent microphone doub…

3e8bf3b

…le starts

Ivorforce force-pushed the gtch/audioservermic branch from 68a73aa to 3e8bf3b Compare December 3, 2025 18:52

Repiteo merged commit 5a7e1bf into godotengine:master Dec 3, 2025
20 checks passed

This was referenced Dec 4, 2025

Expose functions from AudioStreamPlaybackMicrophone to access the microphone AudioDriver::input_buffer independently of the rest of the Audio system #100508

Closed

Add microphone access to Input #105244

Closed

goatchurchprime mentioned this pull request Dec 8, 2025

AudioServer.get_input_frames() demo godotengine/godot-demo-projects#1285

Open

goatchurchprime mentioned this pull request Dec 18, 2025

Fix bad handling of buffer over and underruns in AudioStreamPlaybackMicrophone #114094

Open

RevoluPowered mentioned this pull request Dec 22, 2025

Godot Updated AudioServer RevoluPowered/one-voip-godot-4#25

Open

Uh oh!

Conversation

goatchurchprime commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stuff to do

Uh oh!

Ivorforce left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ivorforce left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Ivorforce left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goatchurchprime commented Dec 1, 2025

Uh oh!

lyuma left a comment

Choose a reason for hiding this comment

Uh oh!

Ivorforce commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allenwp commented Dec 2, 2025

Uh oh!

adamscott left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adamscott Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

goatchurchprime Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivorforce Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

allenwp commented Dec 3, 2025

Uh oh!

Ivorforce commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goatchurchprime commented Dec 3, 2025

Uh oh!

allenwp commented Dec 3, 2025

Uh oh!

Uh oh!

Repiteo commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

goatchurchprime commented Nov 28, 2025 •

edited

Loading

Ivorforce left a comment •

edited

Loading

Ivorforce left a comment •

edited

Loading

Ivorforce commented Dec 2, 2025 •

edited

Loading

goatchurchprime Dec 3, 2025 •

edited

Loading

Ivorforce Dec 3, 2025 •

edited

Loading

Ivorforce commented Dec 3, 2025 •

edited

Loading