[WIP] src: implement a prototype for uv coroutines by jasnell · Pull Request #62494 · nodejs/node

jasnell · 2026-03-29T15:22:04Z

This is just a proof-of-concept NOT MEANT TO BE CONSIDERED TO MERGE YET. I'm opening this to get early feedback on the ideas / approach.

What this does is implement co-routines for libuv calls. There are tests and example bindings that demonstrate the feasibility. The intent here is to demonstrate viability, solicit feedback on implementation approach, determine if this is something we'd want to do, allow experimentation, etc.

This proves that coroutine support is possible. The next consideration is whether it would be desirable. There are a few notable gaps in the POC currently:

~~No async context tracking (push_async_context/pop_async_context). This also means async context frames are not propagated.~~
~~The typical microtask scheduling is not fully enabled~~
~~V8 idle time notifications would need to be worked through~~

This is not meant to land as is, so "Request Changes" or "Approvals" are unnecessary at this point. I'm looking solely for feedback on the idea.

nodejs-github-bot · 2026-03-29T15:22:09Z

Review requested:

@nodejs/gyp

jasnell · 2026-03-29T15:33:31Z

@addaleax ... very curious in your thoughts around this. good idea? bad idea? really terrible idea? feasible? etc

jasnell · 2026-03-29T16:46:10Z

src/coro/uv_awaitable.h

+// ---------------------------------------------------------------------------
+
+template <typename Fn, typename... Args>
+class UvFsAwaitable {


This initial POC uses a different Uv*Awaitable impl for each kind of libuv req handle. It's likely possible to make it a bit more generic but this works well enough for the prototype.

jasnell · 2026-03-30T02:51:52Z

Asked opencode/opus to generate a description of the relative profiles to help with evaluation...

The details here really break down why this would be a benefit. To be certain, the callback case is still faster and most efficient, that wouldn't change. However, using the coroutine approach here instead of the FSReqPromise, for instance, is much more efficient for the promise-based apis.

Details

Performance characteristics: C++ coroutine libuv integration

Allocation profile

Single operation (e.g., `fsPromises.access`)

	Callback (`FSReqCallback`)	Promise (`FSReqPromise`)	Coroutine, no hooks	Coroutine, hooks active
C++ heap allocs	1 (500 bytes)	3 (~970 bytes)	0 (free-list hit)	0 (free-list hit)
V8 heap objects	1 (JS wrapper)	7 (wrapper + Resolver + Promise + 2 ArrayBuffer + 2 TypedArray)	2 (Resolver + Promise)	3 (+ resource Object)
Total allocs	2	10	2	3

The FSReqPromise path unconditionally allocates two AliasedBufferT typed array pairs
(for stat and statfs fields) even for operations that never use them. The coroutine path
has no wasted allocations.

In steady state the coroutine frame is served from a thread-local free-list with 256-byte
size class granularity (up to 4096 bytes, 32 frames per bucket). No malloc/free calls
occur after the first coroutine of each size class completes.

Multi-step operation (open + stat + read + close)

	4x Callback	4x Promise	1 Coroutine, no hooks	1 Coroutine, hooks active
C++ heap allocs	4	12	0 (free-list)	0 (free-list)
V8 heap objects	4 (+ 3 if async/await)	28	2	3
Total allocs	7-8	40	2	3
`malloc`/`free` pairs	4	~12	0 (steady state)	0 (steady state)

The multi-step case is where the coroutine pattern provides the largest benefit. A single
coroutine frame serves all four libuv operations. The intermediate steps (open result
dispatches stat, stat result dispatches read, etc.) are pure C++ with no V8 involvement.

Per-operation overhead

Dispatch path (JS binding call to libuv dispatch)

	Callback	Promise	Coroutine (no hooks)
HandleScopes	1	1	2 (init + on_resume)
V8 API calls (req setup)	2	~8	3
`InternalCallbackScope`	0	0	1 (initial segment)
`Object::New`	0	1	0
`malloc`	0	1	0 (free-list)

The coroutine pays one extra InternalCallbackScope cycle for the initial segment (from
Start() to the first co_await). When no hooks are registered and no ticks are
scheduled, this is roughly 10 cheap operations (integer increments, pointer swaps,
branch-not-taken checks).

Completion path (libuv callback to JS notification)

	Callback	Promise	Coroutine
HandleScopes	3-4	3	2
`InternalCallbackScope`	1	1	1
V8 property lookups	1 ("oncomplete")	1 ("promise")	0
`MakeCallback` chain	3 levels	N/A	N/A
BaseObjectPtr ref counting	~3 ops	~3 ops	0
Weak ref cycles	1	1	0
`free`	1	1	0 (returned to free-list)

The coroutine avoids the MakeCallback indirection chain (AsyncWrap::MakeCallback
string overload -> Name overload -> Function overload -> InternalMakeCallback), the V8
property lookup for the callback function, all BaseObjectPtr reference counting, weak
reference management, and the BaseObject destructor chain.

Multi-step totals

	4x Callback	4x Promise	1 Coroutine
`InternalCallbackScope` entries	4	4	5
Microtask drains (non-trivial)	4	7	1
JS/C++ boundary crossings	8	4	1
`MakeCallback` chains	4	0	0
V8 property lookups	4	4	0
Ref counting ops	~12	~12	0
Weak ref cycles	4	4	0
Promise `.then()` reactions	3 (if async/await)	3	0

For the 4-step readFile coroutine, only the final resolve crosses the JS/C++ boundary.
The three intermediate steps are handle_.resume() -> coroutine body -> dispatch next
libuv call -> suspend. Each intermediate resume segment goes through an
InternalCallbackScope for correctness (async_hooks, context frame, draining), but the
drain finds empty queues and returns after a few comparisons.

Per-resume segment cost (intermediate steps, no hooks)

Each intermediate step (e.g., "open completed, dispatch stat") costs:

Operation	Cost
`handle_.resume()`	indirect jump
`DecreaseWaitingRequestCounter`	integer decrement
`HandleScope` (persistent across segment)	V8 handle limit push
`InternalCallbackScope` constructor	~8 ops (SetIdle, exchange context frame, push async context)
Coroutine body: process result, dispatch next	application-specific
`InternalCallbackScope::Close`	~6 ops (EmitAfter skipped, pop context, PerformCheckpoint no-op, tick check no-op)
`HandleScope` destroy	V8 handle limit pop
`IncreaseWaitingRequestCounter`	integer increment
inner `await_suspend`: dispatch libuv call	the actual I/O

Total: roughly 20-25 cheap operations per intermediate step. No heap allocations, no JS
calls, no string operations, no object creation.

Optimization details

Four specific optimizations reduce overhead compared to a naive coroutine implementation:

Lazy resource object: Object::New(isolate) (the most expensive single operation
in init_tracking) is only called when async_hooks listeners are active (kInit > 0
or kUsesExecutionAsyncResource > 0). InternalCallbackScope was updated to accept a
null Global<Object>* for the resource, with push_async_context already handling
null correctly by skipping the native_execution_async_resources_ store.
Thread-local frame allocator: Coroutine frames are allocated from a thread-local
free-list with 256-byte size class granularity via promise_type::operator new/delete.
Each bucket holds up to 32 frames (bounded by kMaxCachedPerBucket). After the first
coroutine of a given size class completes, subsequent ones have zero malloc overhead.
Cached type name string: The V8 string for the async resource type name is cached
per Isolate in IsolateData::static_str_map (the same mechanism used by HTTP/2 header
name caching). Only the first coroutine of a given type per Isolate pays the
String::NewFromUtf8 cost. Safe with Worker threads since each Worker has its own
IsolateData.
No AdjustAmountOfExternalAllocatedMemory: Removed. The inaccurate fixed estimate
(1024 bytes) was giving V8 misleading GC pressure signals, the API is deprecated in
favor of ExternalMemoryAccounter, and short-lived coroutine frames don't benefit from
GC heuristic adjustments.

Node.js integration

The coroutine implementation provides the same integration as the existing ReqWrap
pattern:

Feature	Status
`async_hooks` init/before/after/destroy	Full support via `InternalCallbackScope` + `EmitAsyncInit`/`EmitDestroy`
`AsyncLocalStorage`	Full support via `async_context_frame` save/restore in `InternalCallbackScope`
`executionAsyncResource()`	Supported (resource object created when hooks are active)
Microtask draining	Every resume-to-suspend segment drains via `InternalCallbackScope::Close`
`process.nextTick` draining	Same mechanism as microtask draining
Event loop liveness	`request_waiting_` incremented on dispatch, decremented on completion
Environment teardown	Coroutine task list iterated in `CleanupHandles()`, `uv_cancel()` called on each
`can_call_into_js()` guard	Checked in resolve/reject helpers
Trace events	`TRACE_EVENT_NESTABLE_ASYNC_BEGIN`/`END` emitted in init/destroy
Unhandled exceptions	Detached coroutines with captured exceptions call `std::terminate()`

jasnell · 2026-03-30T15:11:48Z

/cc @nodejs/libuv

Signed-off-by: James M Snell <jasnell@gmail.com> Assisted-by: Opencode/Open 4.6

Qard

It's an interesting idea. My main questions/concerns are:

How do existing HandleScopes (and other scopes) interact with this if we were to want to call into some coroutines thing when we're already in a scope?
What do the benchmarks say? It'd be great to have concrete numbers for memory use at load.
Could we perhaps design the dispatch to support also support carrying callbacks through to the completion so we could have symmetrical support for callbacks and promises everywhere? It'd be nice if we could get both free promise support for everything and also eliminate the unnecessary overhead from wrapping of promises around callbacks.
Could this enable writing everything as coroutines with a translation layer to promises/callbacks such that we could have sequencing of native-to-native things skip the JS transition, but be operated through the same interfaces? It'd be super nice if we could compose calls together and have their linkage avoid barrier crosses when not necessary. You have the FS example of that as a distinct function, but I'm wondering about having the individual steps usable as their own calls from JS such that we can put all the business logic in things composable on the C++ side and be trivially translated to JS?
Presumably the C++20 coroutines are just their own separately managed stacks and don't influence the other non-coroutines as fiber systems would? Just want to be sure we aren't getting into situations like swapping out libuv memory and then blowing up when a signal handler triggers and is not in the state it expects.

jasnell · 2026-03-30T17:07:38Z

How do existing HandleScopes (and other scopes) interact with this if we were to want to call into some coroutines thing when we're already in a scope?

It really Just Works. If you take a look at the example uses in node_file.c++, specifically the two v8 callback functions that accept the v8::FunctionCallbackInfo, you'll see a specific case there that illustrates these. The callbacks are run with an active HandleScope and we're calling into the coroutines with no problems.

What do the benchmarks say? It'd be great to have concrete numbers for memory use at load.

I haven't run any yet. Before I went through the trouble of crafting benchmarks I wanted to make sure there weren't any obvious reasons why we shouldn't do this at all.

Could we perhaps design the dispatch to support also support carrying callbacks through to the completion so we could have symmetrical support for callbacks and promises everywhere? It'd be nice if we could get both free promise support for everything and also eliminate the unnecessary overhead from wrapping of promises around callbacks.
...
Could this enable writing everything as coroutines with a translation layer to promises/callbacks such that we could have sequencing of native-to-native things skip the JS transition, but be operated through the same interfaces? It'd be super nice if we could compose calls together and have their linkage avoid barrier crosses when not necessary. You have the FS example of that as a distinct function, but I'm wondering about having the individual steps usable as their own calls from JS such that we can put all the business logic in things composable on the C++ side and be trivially translated to JS?

Potentially, yes. The callback version currently is still going to be a bit faster because the coroutine setup itself has some overhead. Coalescing is worth exploring, however, we'd just need to make sure there's not too much of a perf cost.

Presumably the C++20 coroutines are just their own separately managed stacks and don't influence the other non-coroutines as fiber systems would? Just want to be sure we aren't getting into situations like swapping out libuv memory and then blowing up when a signal handler triggers and is not in the state it expects.

That's something we need to fully test but I don't believe it'll be a problem.

Qard · 2026-03-30T18:13:52Z

I wanted to make sure there weren't any obvious reasons why we shouldn't do this at all.

The only concern I would really have in that regard is the significant chance of behaviour changes. This is interacting with quite a bit of scheduling machinery. There's a lot of potential for edge cases in there. I haven't spotted anything on a cursory glance, but obviously that'd need a whole lot more review before such a change should be landed. I think what we need for this is a lot more just research/validation than the actual build itself. Confidence of correctness is hard on something like this.

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Mar 29, 2026

jasnell commented Mar 29, 2026

View reviewed changes

jasnell force-pushed the jasnell/uv-coroutine branch 3 times, most recently from 4bf88eb to 6e6b092 Compare March 30, 2026 02:35

jasnell force-pushed the jasnell/uv-coroutine branch from 6e6b092 to 3859c94 Compare March 30, 2026 15:01

jasnell requested review from Qard, addaleax and mcollina March 30, 2026 15:11

src: implement a prototype for uv coroutines

7cdb3fb

Signed-off-by: James M Snell <jasnell@gmail.com> Assisted-by: Opencode/Open 4.6

jasnell force-pushed the jasnell/uv-coroutine branch from 3859c94 to 7cdb3fb Compare March 30, 2026 16:14

Qard reviewed Mar 30, 2026

View reviewed changes

Uh oh!

Conversation

jasnell commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nodejs-github-bot commented Mar 29, 2026

Uh oh!

jasnell commented Mar 29, 2026

Uh oh!

jasnell Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

jasnell commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance characteristics: C++ coroutine libuv integration

Allocation profile

Single operation (e.g., fsPromises.access)

Multi-step operation (open + stat + read + close)

Per-operation overhead

Dispatch path (JS binding call to libuv dispatch)

Completion path (libuv callback to JS notification)

Multi-step totals

Per-resume segment cost (intermediate steps, no hooks)

Optimization details

Node.js integration

Uh oh!

jasnell commented Mar 30, 2026

Uh oh!

Qard left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasnell commented Mar 30, 2026

Uh oh!

Qard commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jasnell commented Mar 29, 2026 •

edited

Loading

jasnell commented Mar 30, 2026 •

edited

Loading

Single operation (e.g., `fsPromises.access`)

Qard left a comment •

edited

Loading