Core Maintainer Meeting - February 4th #2204

dsp-ant · 2026-02-04T20:02:00Z

dsp-ant
Feb 4, 2026
Maintainer

MCP Core Maintainer Meeting — February 4, 2026

Attendance:

David Soria Parra (Lead Core Maintainer)
Den Delimarsky (Core Maintainer)
Che Liu (Core Maintainer)
Peter Alexander (Core Maintainer)
Paul Carleton (Core Maintainer)
Caitie McCaffrey (Core Maintainer)
Nick Aldridge (Core Maintainer)
Tapan Chugh
Cliff Hall
Marcelo Trylesinski

Quorum was reached with 7 out of 9 Core Maintainers

Meeting Summary

The MCP Core Maintainers met to review three Specification Enhancement Proposals (SEPs). Below is a summary of the decisions, rationale, and what contributors should know going forward.

SEPs Reviewed

SEP-2084: Primitive Grouping — Not Accepted at This Time

PR: #2084

What it proposes: A new server capability called "Groups" that organizes MCP primitives (tools, prompts, resources, tasks) into named, hierarchical collections. The goal is to let clients filter and present primitives by group, reducing context overload in LLM interactions.

Decision: After discussion, the core maintainers proceeded to an asynchronous vote on this proposal. The overall sentiment leaned against accepting it at this time. While the design is sound and solves a real organizational challenge, the group concluded that the AI tooling landscape is evolving too quickly to commit to a specific grouping mechanism right now. Tool search and skills have recently emerged as alternative approaches, and it isn't yet clear whether explicit grouping will remain the best path forward.

Key reasoning:

Adding a capability to the protocol is effectively permanent — deprecation and removal are extremely difficult once adoption begins.
Major client implementers have not yet signaled a strong, unmet need for this specific feature.
The group wants to see more ecosystem experimentation (e.g., via extensions or standalone projects) before standardizing a grouping primitive.

What contributors should know: The door is not closed on this idea. Contributors interested in grouping are encouraged to prototype it as an MCP extension or standalone project. If it gains traction in the ecosystem, it becomes a strong candidate for future standardization.

Vote:

🟢 0 Accept,
🟡 0 Accept with changes
🔴 6 rejects

Proposal was rejected.

SEP-414: OpenTelemetry Trace Context Propagation — Supported, Proceeding to Vote

PR: #414

What it proposes: Formalizes the convention of propagating W3C OpenTelemetry trace context (traceparent, tracestate, baggage) through the _meta parameter in MCP requests. This documents a pattern that is already in use across several SDKs, including the C# implementation.

Decision: The core maintainers expressed broad support and moved the proposal to an asynchronous vote. The group also agreed to make a namespacing exception: while MCP normally requires DNS-style prefixes for meta keys, the OTel fields will be allowed without prefixes to maintain compatibility with the OpenTelemetry standard.

Key reasoning:

This follows MCP's core philosophy of standardizing de facto usage patterns rather than inventing new ones.
Multiple SDKs already implement this pattern, making formal documentation valuable for interoperability.
OpenTelemetry is a widely adopted industry standard, justifying the prefix exception.

Open investigation: The maintainers asked for further research into how existing SDKs and major client implementers handle OTel context today (headers vs. meta fields) and what the deprecation path would look like if the approach needs to change later.

Vote:

🟢 5 Accept,
🟡 1 Accept with changes
🔴 0 rejects

Proposal was accepted.

SEP-1938: Agency Hint Tool Annotation — Not Accepted

PR: #1938

What it proposes: An optional agencyHint boolean annotation on tools to signal that a tool performs autonomous, multi-step AI reasoning rather than a simple atomic operation. The intent was to help clients apply appropriate UX patterns (confirmation flows, progress monitoring) for agentic tools.

Decision: The core maintainers did not support this proposal.

Key reasoning:

The concept of "agency" lacks a clear, shared definition — the boundary between an agentic tool and a non-agentic one is blurry.
No major client implementers have demonstrated a need to distinguish tools this way in practice.
The proposal had not been validated through ecosystem adoption before being brought to the core group.

What contributors should know: The agents working group is encouraged to continue exploring how to surface tool complexity to clients, but future proposals in this area should be grounded in demonstrated ecosystem need and clearer definitions.

Vote: Still ongoing

Broader Takeaways for Contributors

MCP's standardization philosophy

A key theme of the meeting was reinforcing MCP's approach to protocol evolution: standardize patterns that have already emerged and proven themselves in the ecosystem, rather than inventing new ones. MCP Apps was cited as a success story — it was built outside the core spec, gained community adoption, and was then brought into the protocol.

Extensions are the right starting point

If you have an idea for a new capability, the recommended path is to build it as an MCP extension first. Extensions let you ship, iterate, and gather real-world feedback without the permanence of a core protocol addition. If your extension gains widespread adoption, it becomes a natural candidate for standardization.

What the maintainers are prioritizing

The core maintainers committed to publishing an updated roadmap within the coming weeks to give the community clearer guidance on priorities. Topics like streaming, transport improvements, and protocol stability are high on the list. Contributors are encouraged to watch for that roadmap and align their efforts accordingly.

Improved proposal process

The maintainers are working on improvements to the SEP review process, including better pre-screening of proposals before they reach the core group, clearer documentation of decisions, and a published vision-and-values document for the protocol. The goal is to help contributors understand what makes a strong proposal and avoid investing effort in directions that aren't aligned with protocol priorities.

Links

scottslewis · 2026-02-04T22:28:45Z

scottslewis
Feb 4, 2026

Improved proposal process

The maintainers are working on improvements to the SEP review process, including better pre-screening of proposals before they reach the core group, clearer documentation of decisions, and a published vision-and-values document for the protocol. The goal is to help contributors understand what makes a strong proposal and avoid investing effort in directions that aren't aligned with protocol priorities.

With due respect, I think that centralizing decision making in a statically-defined, exclusive, very small set of core maintainers simply does not scale to a broadly-used protocol (and/or API(s)/sdks for that matter). Although better process docs, clearer and more documentation of decisions, contributor education, are all good things...based upon my experience they will still eventually create a bottleneck of decision making...especially in something like AI/ML where tooling and integration are just beginning.

I think the only way to scale something in the open source world is to depend upon and trust the dev community, encourage/enable greater open participation...both in terms of technical contributions, experiments, working group participation, etc, as well as the decision-making process/project mgmt/inter-project coordination (e.g. conformance testing). Of course there have to be ways of creating and establishing community trust continuously.

I don't mean to discount that this can be a very difficult challenge, specifically because of backward compatibility requirements to assure that protocol consumers/products are not continuously broken. My only observation is that diversity of participation and expertise can give both innovation and protocol stability, if an participatory architecture is adopted and democratic, community-based decision making (trust) is enabled. In many respects, protocol stability is similar to maintaining API backward compatibility, which has been successfully done over the long term in several existing open source organizations including the ones I've participated in over many years.

0 replies

bdoyle0182 · 2026-02-05T02:37:59Z

bdoyle0182
Feb 5, 2026

Respect the decision to wait on groups as it is a one way door if the group feels things are still moving too fast to adopt. Would appreciate a couple clarification points if possible:

The goal is to let clients filter and present primitives by group, reducing context overload in LLM interactions.

I thought this was one of several goals and this wasn't more important than the other goals that this was no longer all about context management. For my use cases, the need of groups comes from server side organization and access controls to things like resources / prompts for a very large organization.

Tool search and skills have recently emerged as alternative approaches, and it isn't yet clear whether explicit grouping will remain the best path forward.

These alternatives handle dynamic context discovery client side, but I've yet to see anything that would solve for server side organization and access without something being in the protocol itself. Without an organization concept, it inhibits our ability to move forward with MCP for some of the things that we want to do with enterprise resource / prompt libraries.

It would be great for my understanding to get more insight from the core maintainers perspective on the importance of working towards an organization concept beyond just free-floating primitives as a protocol priority? Even if the answer is it's a priority, but the stance is it's too soon to coalesce on an implementation for this that's helpful for me.

If groups can be accomplished with an extension first I'm perfectly happy with that and better understand the reasoning with extensions coming, but I'm struggling conceptually understanding how that would work. (but I also have a limited understanding of extensions at this point or how to even get started on that; I've only skimmed the SEP a couple months ago and see that just got merged didn't realize it didn't ultimately get included in the November release)

Edit: Okay I think I grasp extensions now and agree with the philosophy here.

2 replies

dsp-ant Feb 9, 2026
Maintainer Author

I thought this was one of several goals and this wasn't more important than the other goals that this was no longer all about context management. For my use cases, the need of groups comes from server side organization and access controls to things like resources / prompts for a very large organization.

I would be interested to hear very specific use cases here. I might not full grasp the challenge here and would love to learn about it (also happy to do that via email if you prefer). The use cases I am aware of would mostly be solved by exposing oauth scopes directly at a primitive level (tools/resources/prompts) and handle access levels this way. After access is provided based on scopes, the underlying selection problem remains the same.

scottslewis Feb 9, 2026

From the working group public discussion

scottslewis · 2026-02-05T21:16:08Z

scottslewis
Feb 5, 2026

Extensions are the right starting point

If you have an idea for a new capability, the recommended path is to build it as an MCP extension first. Extensions let you
ship, iterate, and gather real-world feedback without the permanence of a core protocol addition. If your extension gains widespread adoption, it becomes a natural candidate for standardization.

Observation: If 'widespread adoption' is defined too narrowly, it can effectively turn 'standardization' over to large players only.

At the schema spec level, what is meant by extensions? Is this something more/other than using _meta? If so, I would appreciate pointers to documentation and/or existing code examples of using this mechanism.

0 replies

renatomarinho · 2026-03-04T08:40:32Z

renatomarinho
Mar 4, 2026

We built MCP Fusion, a grouping and routing layer over the MCP SDK, following the guidance in these notes. Building it exposed specific, verifiable problems in the current protocol that grouping would resolve. I want to put these on record precisely, because they are spec-level issues - not preference.

The core problem: tool annotations become semantically incoherent under consolidation

MCP tool annotations - destructiveHint, readOnlyHint, idempotentHint - are defined at the tool level. This is fine when tools are atomic. But in the absence of a protocol-level grouping primitive, the ecosystem is independently converging on tool consolidation as the primary strategy for managing large action surfaces: multiple related operations collapsed behind one MCP tool, dispatched via a discriminator enum.

When that happens, the annotation system breaks in a way that is not a framework bug - it is a protocol constraint.

Consider a consolidated platform tool with these actions:

users.list       → readOnly: true,  destructive: false
users.create     → readOnly: false, destructive: false
users.ban        → readOnly: false, destructive: true
billing.invoices → readOnly: true,  destructive: false
billing.refund   → readOnly: false, destructive: true

The only safe annotation resolution for this tool under the current spec is:

destructiveHint: true - because any action is destructive
readOnlyHint: false - because not all actions are read-only
idempotentHint: false - because not all actions are idempotent

This is the correct conservative resolution. It is also semantically wrong for 3 out of 5 actions.

The annotation degradation is not just a UX inconvenience - it is a security concern. readOnlyHint and destructiveHint are used by security middlewares to gate execution: read-only tools bypass confirmation flows, destructive tools trigger audit logging or require elevated entitlements. When the protocol forces these fields to be imprecise, two failure modes follow:

False positives: every users.list call triggers destructive-tool safeguards - audit entries, confirmation prompts, rate limits designed for write operations. This creates alert fatigue that operators learn to ignore, which is itself a security risk.
False negatives: if a team resolves annotations optimistically (marking the tool readOnly: true because most actions are read-only), a destructive action silently bypasses the safeguards designed to catch it.

This is not a solvable problem at the framework level. It requires a protocol primitive that allows behavioral semantics to be scoped below the tool level.

Why consolidation is happening: the alternative is worse

This is a forced tradeoff between two protocol-level constraints that currently cannot both be satisfied:

Option A - Flat individual tools: Clean annotation semantics, clear per-action schemas, optimal LLM tool-use quality. Cost: tools/list token pressure grows linearly with action count. At 50+ actions, the model starts losing track. At 200+, context exhaustion is a real operational problem.

Option B - Consolidated tools with discriminators: Controlled tools/list token budget. Cost: annotation semantics degrade, the LLM must navigate a compound discriminator schema (group + action + variant-specific parameters), and framework authors must build significant compensating machinery - 4-tier per-field annotation systems, conservative aggregation logic, structured error messages to help the model self-correct after wrong discriminator selection.

Neither option is correct. Both exist because the protocol has no way to express "these tools are structurally related, share behavioral scope, and should be reasoned about as a domain" without collapsing them into one endpoint.

There is a second-order cost to Option B that compounds over time. When the model selects the wrong discriminator or sends parameters for the wrong action variant, the error returns attributed to a generic tool name - platform - not to the specific operation that failed. The model's self-correction loop must then reason about which action within that tool was intended, from a validation error that carries no action-level context. In a flat tool model, the error returns attributed to users.ban and correction is immediate. Under consolidation, that precision is lost at the protocol boundary.

Protocol-level grouping would decouple the organizational concern from the exposure concern. Tools could remain individual and well-typed - preserving annotation semantics and LLM self-correction precision - while being declared as members of a namespace that clients can use for filtering, governance, and presentation.

On the framing of grouping vs. tool search

The meeting notes cited tool search and skills as alternative approaches. These solve different problems and the distinction matters for enterprise deployments.

Tool search operates at inference time on whatever is already in tools/list. It helps the model select the right tool from the visible set. It is a client-side, runtime concern.

Namespace grouping operates at server configuration time and determines what enters tools/list in the first place. In a large organization where billing.* is governed by the finance team and must not be visible to sessions without the appropriate entitlement, the requirement is structural - which domains are exposed to which clients, at configuration time, based on server-side policy. Tool search has no surface to act on until that boundary is already drawn.

These are orthogonal. Solving discovery does not solve governance. Both are needed for enterprise-grade deployments, and conflating them in the rejection rationale leaves the governance problem unaddressed.

The specific question

The annotation incoherence described above is reproducible from the current spec and observable in any implementation that attempts consolidation. I am not asking to reopen SEP-2084.

The specific question is: how does the core group see the annotation semantics problem being resolved without a sub-tool scoping primitive?

If the answer is that full hierarchical grouping is still premature, we understand - but the problem does not go away. A lighter-weight path might be worth considering: the protocol could allow the tool name field itself to carry structured metadata (e.g. finance/invoices.get) as a formally specified convention, or CallToolResult could support execution-branch metadata that restores per-action behavioral context at the response level. Neither requires the full Groups capability from SEP-2084.

If there is an intended solution within the current spec - or a direction already planned - we will build toward that. If there is not, we believe the annotation semantics problem is concrete enough to warrant a scoped re-engagement, independent of the broader grouping discussion.

Happy to provide a minimal reproducible example against the SDK if that would be useful.

2 replies

SamMorrowDrums Mar 4, 2026
Collaborator

@renatomarinho this is an option being considered: #1862

If the Annotations working group is established, runtime annotations are definitely at least going to be discussed.

renatomarinho Mar 4, 2026

@SamMorrowDrums Yes, I've been following #1862 closely - which is exactly why I raised this.

tools/resolve operates at invocation time: the client already knows the intended arguments and requests refined annotations before tools/call. It solves that layer precisely.

The problem I described is upstream of invocation entirely. Consolidation isn't a framework preference - it's the only viable response to the absence of a namespace primitive in tools/list. Flat tools don't scale past ~50 actions: token pressure grows linearly, and context exhaustion is a real operational problem at 200+. That's precisely the tradeoff SEP-2084 was trying to resolve. Without a grouping or namespace mechanism, consolidation is forced on every framework operating at scale.

It is tempting to point to modern 1M+ token context windows as a workaround for this, but treating massive context as an architectural crutch is an anti-pattern. Dumping hundreds of flat tools into a prompt inflates TTFT and input costs at every inference call, and more importantly, it entirely sidesteps the security boundary. Context size cannot solve a governance problem. If a session lacks entitlements, those tools shouldn't reach the inference layer at all, regardless of how many tokens the model can theoretically digest.

Once consolidated to avoid these scale issues, static annotations in tools/list must represent the worst case across all actions. At that point tools/resolve hasn't been - and cannot be - called, because no arguments exist yet. Every system that reads the list operates on those static annotations: security middleware deciding what to expose to a session, clients rendering tool affordances, access control systems gating entitlements. All of this happens before any inference, before the model selects a tool, before tools/resolve is in scope.

These are two distinct protocol problems at two distinct layers:

Invocation layer → SEP-1862: Tool Resolution #1862 solves this.
List layer → static annotation coherence at declaration time, a direct consequence of the absence of a namespace primitive. Still open.

The governance concern lives at this list layer: for deployments where billing.* must not be visible to sessions without the appropriate entitlement, tools/resolve has no surface to act on. That boundary must be drawn at tools/list time.

The Annotations working group will naturally cover runtime/invocation annotations, but is there a separate track where the discovery/list layer primitives should be raised, or would the working group scope extend to cover structural issues like namespaces?

Core Maintainer Meeting - February 4th #2204

Uh oh!

Uh oh!

dsp-ant Feb 4, 2026 Maintainer

MCP Core Maintainer Meeting — February 4, 2026

Meeting Summary

SEPs Reviewed

SEP-2084: Primitive Grouping — Not Accepted at This Time

SEP-414: OpenTelemetry Trace Context Propagation — Supported, Proceeding to Vote

SEP-1938: Agency Hint Tool Annotation — Not Accepted

Broader Takeaways for Contributors

MCP's standardization philosophy

Extensions are the right starting point

What the maintainers are prioritizing

Improved proposal process

Links

Replies: 4 comments · 4 replies

Uh oh!

scottslewis Feb 4, 2026

Uh oh!

Uh oh!

bdoyle0182 Feb 5, 2026

Uh oh!

dsp-ant Feb 9, 2026 Maintainer Author

Uh oh!

scottslewis Feb 9, 2026

Uh oh!

scottslewis Feb 5, 2026

Uh oh!

renatomarinho Mar 4, 2026

The core problem: tool annotations become semantically incoherent under consolidation

Why consolidation is happening: the alternative is worse

On the framing of grouping vs. tool search

The specific question

Uh oh!

SamMorrowDrums Mar 4, 2026 Collaborator

Uh oh!

renatomarinho Mar 4, 2026

dsp-ant
Feb 4, 2026
Maintainer

Replies: 4 comments 4 replies

scottslewis
Feb 4, 2026

bdoyle0182
Feb 5, 2026

dsp-ant Feb 9, 2026
Maintainer Author

scottslewis
Feb 5, 2026

renatomarinho
Mar 4, 2026

SamMorrowDrums Mar 4, 2026
Collaborator