X Tutup
Skip to content

[WIP] src: implement a prototype for uv coroutines#62494

Draft
jasnell wants to merge 1 commit intonodejs:mainfrom
jasnell:jasnell/uv-coroutine
Draft

[WIP] src: implement a prototype for uv coroutines#62494
jasnell wants to merge 1 commit intonodejs:mainfrom
jasnell:jasnell/uv-coroutine

Conversation

@jasnell
Copy link
Copy Markdown
Member

@jasnell jasnell commented Mar 29, 2026

This is just a proof-of-concept NOT MEANT TO BE CONSIDERED TO MERGE YET. I'm opening this to get early feedback on the ideas / approach.

What this does is implement co-routines for libuv calls. There are tests and example bindings that demonstrate the feasibility. The intent here is to demonstrate viability, solicit feedback on implementation approach, determine if this is something we'd want to do, allow experimentation, etc.

This proves that coroutine support is possible. The next consideration is whether it would be desirable. There are a few notable gaps in the POC currently:

  1. No async context tracking (push_async_context/pop_async_context). This also means async context frames are not propagated.
  2. The typical microtask scheduling is not fully enabled
  3. V8 idle time notifications would need to be worked through

This is not meant to land as is, so "Request Changes" or "Approvals" are unnecessary at this point. I'm looking solely for feedback on the idea.

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/gyp

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Mar 29, 2026
@jasnell
Copy link
Copy Markdown
Member Author

jasnell commented Mar 29, 2026

@addaleax ... very curious in your thoughts around this. good idea? bad idea? really terrible idea? feasible? etc

// ---------------------------------------------------------------------------

template <typename Fn, typename... Args>
class UvFsAwaitable {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initial POC uses a different Uv*Awaitable impl for each kind of libuv req handle. It's likely possible to make it a bit more generic but this works well enough for the prototype.

@jasnell jasnell force-pushed the jasnell/uv-coroutine branch 3 times, most recently from 4bf88eb to 6e6b092 Compare March 30, 2026 02:35
@jasnell
Copy link
Copy Markdown
Member Author

jasnell commented Mar 30, 2026

Asked opencode/opus to generate a description of the relative profiles to help with evaluation...

The details here really break down why this would be a benefit. To be certain, the callback case is still faster and most efficient, that wouldn't change. However, using the coroutine approach here instead of the FSReqPromise, for instance, is much more efficient for the promise-based apis.

Details

Performance characteristics: C++ coroutine libuv integration

Allocation profile

Single operation (e.g., fsPromises.access)

Callback (FSReqCallback) Promise (FSReqPromise) Coroutine, no hooks Coroutine, hooks active
C++ heap allocs 1 (500 bytes) 3 (~970 bytes) 0 (free-list hit) 0 (free-list hit)
V8 heap objects 1 (JS wrapper) 7 (wrapper + Resolver + Promise + 2 ArrayBuffer + 2 TypedArray) 2 (Resolver + Promise) 3 (+ resource Object)
Total allocs 2 10 2 3

The FSReqPromise path unconditionally allocates two AliasedBufferT typed array pairs
(for stat and statfs fields) even for operations that never use them. The coroutine path
has no wasted allocations.

In steady state the coroutine frame is served from a thread-local free-list with 256-byte
size class granularity (up to 4096 bytes, 32 frames per bucket). No malloc/free calls
occur after the first coroutine of each size class completes.

Multi-step operation (open + stat + read + close)

4x Callback 4x Promise 1 Coroutine, no hooks 1 Coroutine, hooks active
C++ heap allocs 4 12 0 (free-list) 0 (free-list)
V8 heap objects 4 (+ 3 if async/await) 28 2 3
Total allocs 7-8 40 2 3
malloc/free pairs 4 ~12 0 (steady state) 0 (steady state)

The multi-step case is where the coroutine pattern provides the largest benefit. A single
coroutine frame serves all four libuv operations. The intermediate steps (open result
dispatches stat, stat result dispatches read, etc.) are pure C++ with no V8 involvement.

Per-operation overhead

Dispatch path (JS binding call to libuv dispatch)

Callback Promise Coroutine (no hooks)
HandleScopes 1 1 2 (init + on_resume)
V8 API calls (req setup) 2 ~8 3
InternalCallbackScope 0 0 1 (initial segment)
Object::New 0 1 0
malloc 0 1 0 (free-list)

The coroutine pays one extra InternalCallbackScope cycle for the initial segment (from
Start() to the first co_await). When no hooks are registered and no ticks are
scheduled, this is roughly 10 cheap operations (integer increments, pointer swaps,
branch-not-taken checks).

Completion path (libuv callback to JS notification)

Callback Promise Coroutine
HandleScopes 3-4 3 2
InternalCallbackScope 1 1 1
V8 property lookups 1 ("oncomplete") 1 ("promise") 0
MakeCallback chain 3 levels N/A N/A
BaseObjectPtr ref counting ~3 ops ~3 ops 0
Weak ref cycles 1 1 0
free 1 1 0 (returned to free-list)

The coroutine avoids the MakeCallback indirection chain (AsyncWrap::MakeCallback
string overload -> Name overload -> Function overload -> InternalMakeCallback), the V8
property lookup for the callback function, all BaseObjectPtr reference counting, weak
reference management, and the BaseObject destructor chain.

Multi-step totals

4x Callback 4x Promise 1 Coroutine
InternalCallbackScope entries 4 4 5
Microtask drains (non-trivial) 4 7 1
JS/C++ boundary crossings 8 4 1
MakeCallback chains 4 0 0
V8 property lookups 4 4 0
Ref counting ops ~12 ~12 0
Weak ref cycles 4 4 0
Promise .then() reactions 3 (if async/await) 3 0

For the 4-step readFile coroutine, only the final resolve crosses the JS/C++ boundary.
The three intermediate steps are handle_.resume() -> coroutine body -> dispatch next
libuv call -> suspend. Each intermediate resume segment goes through an
InternalCallbackScope for correctness (async_hooks, context frame, draining), but the
drain finds empty queues and returns after a few comparisons.

Per-resume segment cost (intermediate steps, no hooks)

Each intermediate step (e.g., "open completed, dispatch stat") costs:

Operation Cost
handle_.resume() indirect jump
DecreaseWaitingRequestCounter integer decrement
HandleScope (persistent across segment) V8 handle limit push
InternalCallbackScope constructor ~8 ops (SetIdle, exchange context frame, push async context)
Coroutine body: process result, dispatch next application-specific
InternalCallbackScope::Close ~6 ops (EmitAfter skipped, pop context, PerformCheckpoint no-op, tick check no-op)
HandleScope destroy V8 handle limit pop
IncreaseWaitingRequestCounter integer increment
inner await_suspend: dispatch libuv call the actual I/O

Total: roughly 20-25 cheap operations per intermediate step. No heap allocations, no JS
calls, no string operations, no object creation.

Optimization details

Four specific optimizations reduce overhead compared to a naive coroutine implementation:

  1. Lazy resource object: Object::New(isolate) (the most expensive single operation
    in init_tracking) is only called when async_hooks listeners are active (kInit > 0
    or kUsesExecutionAsyncResource > 0). InternalCallbackScope was updated to accept a
    null Global<Object>* for the resource, with push_async_context already handling
    null correctly by skipping the native_execution_async_resources_ store.

  2. Thread-local frame allocator: Coroutine frames are allocated from a thread-local
    free-list with 256-byte size class granularity via promise_type::operator new/delete.
    Each bucket holds up to 32 frames (bounded by kMaxCachedPerBucket). After the first
    coroutine of a given size class completes, subsequent ones have zero malloc overhead.

  3. Cached type name string: The V8 string for the async resource type name is cached
    per Isolate in IsolateData::static_str_map (the same mechanism used by HTTP/2 header
    name caching). Only the first coroutine of a given type per Isolate pays the
    String::NewFromUtf8 cost. Safe with Worker threads since each Worker has its own
    IsolateData.

  4. No AdjustAmountOfExternalAllocatedMemory: Removed. The inaccurate fixed estimate
    (1024 bytes) was giving V8 misleading GC pressure signals, the API is deprecated in
    favor of ExternalMemoryAccounter, and short-lived coroutine frames don't benefit from
    GC heuristic adjustments.

Node.js integration

The coroutine implementation provides the same integration as the existing ReqWrap
pattern:

Feature Status
async_hooks init/before/after/destroy Full support via InternalCallbackScope + EmitAsyncInit/EmitDestroy
AsyncLocalStorage Full support via async_context_frame save/restore in InternalCallbackScope
executionAsyncResource() Supported (resource object created when hooks are active)
Microtask draining Every resume-to-suspend segment drains via InternalCallbackScope::Close
process.nextTick draining Same mechanism as microtask draining
Event loop liveness request_waiting_ incremented on dispatch, decremented on completion
Environment teardown Coroutine task list iterated in CleanupHandles(), uv_cancel() called on each
can_call_into_js() guard Checked in resolve/reject helpers
Trace events TRACE_EVENT_NESTABLE_ASYNC_BEGIN/END emitted in init/destroy
Unhandled exceptions Detached coroutines with captured exceptions call std::terminate()

@jasnell jasnell force-pushed the jasnell/uv-coroutine branch from 6e6b092 to 3859c94 Compare March 30, 2026 15:01
@jasnell jasnell requested review from Qard, addaleax and mcollina March 30, 2026 15:11
@jasnell
Copy link
Copy Markdown
Member Author

jasnell commented Mar 30, 2026

/cc @nodejs/libuv

Signed-off-by: James M Snell <jasnell@gmail.com>
Assisted-by: Opencode/Open 4.6
@jasnell jasnell force-pushed the jasnell/uv-coroutine branch from 3859c94 to 7cdb3fb Compare March 30, 2026 16:14
Copy link
Copy Markdown
Member

@Qard Qard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting idea. My main questions/concerns are:

  • How do existing HandleScopes (and other scopes) interact with this if we were to want to call into some coroutines thing when we're already in a scope?
  • What do the benchmarks say? It'd be great to have concrete numbers for memory use at load.
  • Could we perhaps design the dispatch to support also support carrying callbacks through to the completion so we could have symmetrical support for callbacks and promises everywhere? It'd be nice if we could get both free promise support for everything and also eliminate the unnecessary overhead from wrapping of promises around callbacks.
  • Could this enable writing everything as coroutines with a translation layer to promises/callbacks such that we could have sequencing of native-to-native things skip the JS transition, but be operated through the same interfaces? It'd be super nice if we could compose calls together and have their linkage avoid barrier crosses when not necessary. You have the FS example of that as a distinct function, but I'm wondering about having the individual steps usable as their own calls from JS such that we can put all the business logic in things composable on the C++ side and be trivially translated to JS?
  • Presumably the C++20 coroutines are just their own separately managed stacks and don't influence the other non-coroutines as fiber systems would? Just want to be sure we aren't getting into situations like swapping out libuv memory and then blowing up when a signal handler triggers and is not in the state it expects.

@jasnell
Copy link
Copy Markdown
Member Author

jasnell commented Mar 30, 2026

How do existing HandleScopes (and other scopes) interact with this if we were to want to call into some coroutines thing when we're already in a scope?

It really Just Works. If you take a look at the example uses in node_file.c++, specifically the two v8 callback functions that accept the v8::FunctionCallbackInfo, you'll see a specific case there that illustrates these. The callbacks are run with an active HandleScope and we're calling into the coroutines with no problems.

What do the benchmarks say? It'd be great to have concrete numbers for memory use at load.

I haven't run any yet. Before I went through the trouble of crafting benchmarks I wanted to make sure there weren't any obvious reasons why we shouldn't do this at all.

Could we perhaps design the dispatch to support also support carrying callbacks through to the completion so we could have symmetrical support for callbacks and promises everywhere? It'd be nice if we could get both free promise support for everything and also eliminate the unnecessary overhead from wrapping of promises around callbacks.
...
Could this enable writing everything as coroutines with a translation layer to promises/callbacks such that we could have sequencing of native-to-native things skip the JS transition, but be operated through the same interfaces? It'd be super nice if we could compose calls together and have their linkage avoid barrier crosses when not necessary. You have the FS example of that as a distinct function, but I'm wondering about having the individual steps usable as their own calls from JS such that we can put all the business logic in things composable on the C++ side and be trivially translated to JS?

Potentially, yes. The callback version currently is still going to be a bit faster because the coroutine setup itself has some overhead. Coalescing is worth exploring, however, we'd just need to make sure there's not too much of a perf cost.

Presumably the C++20 coroutines are just their own separately managed stacks and don't influence the other non-coroutines as fiber systems would? Just want to be sure we aren't getting into situations like swapping out libuv memory and then blowing up when a signal handler triggers and is not in the state it expects.

That's something we need to fully test but I don't believe it'll be a problem.

@Qard
Copy link
Copy Markdown
Member

Qard commented Mar 30, 2026

I wanted to make sure there weren't any obvious reasons why we shouldn't do this at all.

The only concern I would really have in that regard is the significant chance of behaviour changes. This is interacting with quite a bit of scheduling machinery. There's a lot of potential for edge cases in there. I haven't spotted anything on a cursory glance, but obviously that'd need a whole lot more review before such a change should be landed. I think what we need for this is a lot more just research/validation than the actual build itself. Confidence of correctness is hard on something like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

X Tutup