gh-87613: Argument Clinic vectorcall decorator by cmaloney · Pull Request #145381 · python/cpython

cmaloney · 2026-03-01T04:22:35Z

Add @vectorcall as a decorator to Argument Clinic (AC) which emits a Vectorcall Protocol argument parsing C function named {type}_vectorcall. This is only supported for __new__ and __init__ currently to simplify implementation.

The generated code has similar or better performance to existing hand-written cases for list, float, str, tuple, enumerate, reversed, and int. Using the decorator on bytearray, which has no handwritten case, construction got 1.09x faster. For more benchmark details see #87613 (comment).

The @vectorcall decorator has two options:

zero_arg={C_FUNC}: Some types, like int, can be called with zero arguments and return an immortal object in that case. Adding a shortcut is needed to match existing hand-written performance; provides an over 10% performance change for those cases.
exact_only: If the type is not an exact match delegate to the existing non-vectorcall implementation. Needed for str to get matching performance while ensuring correct behavior.

Implementation details:

Adds support for the new decorator with arguments in the AC DSL Parser
Move keyword argument parsing generation from inline to a function so both vectorcall, vc_, and existing can share code generation.
Adds an emit helper to simplify code a bit from existing AC cases

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Issue: Generate vectorcall code to parse arguments using Argument Clinic #87613

Add `@vectorcall` as a decorator to Argument Clinic (AC) which generates a new [Vectorcall Protocol](https://docs.python.org/3/c-api/call.html#the-vectorcall-protocol) argument parsing C function named `{}_vectorcall`. This is only supported for `__new__` and `__init__` currently to simplify implementation. The generated code has similar or better performance to existing hand-written cases for `list`, `float`, `str`, `tuple`, `enumerate`, `reversed`, and `int`. Using the decorator added vectorcall to `bytearray` and construction got 1.09x faster. For more details see the comments in pythongh-87613. The `@vectorcall` decorator has two options: - **zero_arg={C_FUNC}**: Some types, like `int`, can be called with zero arguments and return an immortal object in that case. Adding a shortcut is needed to match existing hand-written performance; provides an over 10% performance change for those cases. - **exact_only**: If the type is not an exact match delegate to the existing non-vectorcall implementation. NEeded for `str` to get matching performance while ensuring correct behavior. Implementation details: - Adds support for the new decorator with arguments in the AC DSL Parser - Move keyword argument parsing generation from inline to a function so both vectorcall, `vc_`, and existing can share code generation. - Adds an `emit` helper to simplify code a bit from existing AC cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

corona10

Could you replace current hand-written with your new DSL.

Let's see how handle them.

cmaloney · 2026-03-01T06:44:04Z

I have commits to do that in my draft branch (https://github.com/python/cpython/compare/main...cmaloney:cpython:ac_vectorcall_v1?expand=0); can pull them into this branch if that would be easier / better to review. This generally produces code that is as fast or faster than the hand-written ones currently (full benchmarking in: #87613 (comment))

cmaloney · 2026-03-01T09:54:55Z

Added commits moving enum.c (reversed, enumerate) and tuple to the new decorator. enum.c had comments pointing to this issue and covers positional + keyword arguments. tuple uses the "zero arg" optimization and has no keyword args. None of those cases use the __init__ code; only cases of that are the new bytearray or list which is otherwise very similar to tuple. Hoping those serve as a good sample for what the code generation looks like relative to the handwritten while iterating; happy to include more in this PR if desired.

cmaloney · 2026-03-01T22:37:03Z

Objects/clinic/enumobject.c.h

+    #undef KWTUPLE
+    PyObject *argsbuf[2];
+    Py_ssize_t noptargs = nargs + (kwnames ? PyTuple_GET_SIZE(kwnames) : 0) - 1;
+    args = _PyArg_UnpackKeywords(args, nargs, NULL, kwnames,


Evaluating direct keyword argument parsing for these / what the code change is relative to the performance change.

For most the hand written vectorcall implementations there aren't a lot of keyword arguments which I think is part of why the performance is equal to existing making this a simplifying refactor. Wondering if with explicit keyword argument parsing get to be quite a bit faster.

The generated code is a little cleaner to read but the performance change is negative if anything in my first attempt here; can pull in if needed but leaning is focus on iterative improvement.

Also found another optimization in str: Doing a one-arg override much just like the zero_arg rather than its generic dispatch which does make a bit of perf change; but think that is a good additional step to add later when expanding to str type.

corona10 · 2026-03-02T02:58:45Z

I will take a look at this PR til end of this week.

skirpichev · 2026-03-02T04:06:48Z

happy to include more in this PR if desired.

I would like to suggest you at least to think about complexobject.c. Based on benchmarks for the float pr (#22432) I would expect a good performance boost (maybe not 1.5x, but more than from freelist addition).

Yes, this case seems to be already covered by the enum.c example (kwargs).

On another hand, the complex class has special hacks to support multiple signatures (complex('123') is allowed, while complex(real='123') - not). Maybe it's not the only case, but I can't find quickly others across the CPython codebase. I suspect that AC magic will not work in this case and we will need some workarounds somewhere (well, maybe just one hand-written case). Though, it would be great if you disprove this hypothesis.

This generally produces code that is as fast or faster than the hand-written ones currently (full benchmarking in: #87613 (comment))

Still, there are some regressions, e.g. int(str). Could you explain this difference?

I also suggest you to try pyperformance on this.

cmaloney · 2026-03-02T10:20:07Z

I would like to suggest you at least to think about complexobject.c. Based on benchmarks for the float pr (#22432) I would expect a good performance boost (maybe not 1.5x, but more than from freelist addition).

Will implemnt it in my draft branch this week. As part of developing this PR I added Vectorcall Protocol support to bytes (cmaloney@f5c7b7c) and bytearray (cmaloney@7de2ab7). With just two small changes: 1. add @vectorcall, 2. set .tp_vectorcall construction is 1.09x to 1.23x faster. Multiply that speedup across the many AC implemented types without vectorcall construction and I definitely get excited.

Still, there are some regressions, e.g. int(str). Could you explain this difference?

The int hand written vectorcall implementation, long_vectorcall, is a particularly elegant switch:

cpython/Objects/longobject.c

Lines 6539 to 6559 in c9a5d9a

    
           long_vectorcall(PyObject *type, PyObject * const*args, 
        
                            size_t nargsf, PyObject *kwnames) 
        
           { 
        
               Py_ssize_t nargs = PyVectorcall_NARGS(nargsf); 
        
               if (kwnames != NULL) { 
        
                   PyThreadState *tstate = PyThreadState_GET(); 
        
                   return _PyObject_MakeTpCall(tstate, type, args, nargs, kwnames); 
        
               } 
        
               switch (nargs) { 
        
                   case 0: 
        
                       return _PyLong_GetZero(); 
        
                   case 1: 
        
                       return PyNumber_Long(args[0]); 
        
                   case 2: 
        
                       return long_new_impl(_PyType_CAST(type), args[0], args[1]); 
        
                   default: 
        
                       return PyErr_Format(PyExc_TypeError, 
        
                                           "int expected at most 2 arguments, got %zd", 
        
                                           nargs); 
        
               } 
        
           }

That switch specializes 1-argument to call PyNumber_Long instead of long_new_impl which matches a very similar performance delta I investigated yesterday in the hand written vectorcall for str. Worried the hand written is faster because the compiler optimizer is doing clever things around the switch form. Adding support for a one_arg special case will need more code in the AC implementation to handle. Overall not sure it's actually worth replacing the hand written int vectorcall with an AC generated version for.

I'm comparing to the handwritten because I want @vectorcall when people try it out on a type they care about to be as good as I can get it. Adding to two types without vectorcall construction, bytes and bytearray, it provides a measurable speedup for a two-line code change. I think correctness of generated code, maintainability of the new decorator, and providing a speedup for types with no vectorcall today is a lot of benefit even if it's not quite as fast as hand written expert code. If adopting the new decorator on an AC type is really low-cost for a significant performance gain that will lead to speedy adoption and a speedier CPython.

I also suggest you to try pyperformance on this.

Will run on this PR as it exists currently.

I can also run on my draft branch but not sure that will give a clear signal as it migrates every hand-written vectorcall even if it makes them slower. Ideally to me would be able to figure out what types are commonly constructed in pyperformance benchmarks so I can make a draft branch adding vectorcall support to those. Not sure what would be the most important set of types to migrate before running pyperformance.

erlend-aasland · 2026-03-02T15:17:34Z

Did you consider adding this implicitly if supported, instead of making it opt-in? Disclaimer: I didn't take a look at the implementation yet.

erlend-aasland · 2026-03-02T15:39:52Z

Tools/clinic/libclinic/dsl_parser.py

+        self.vectorcall = False
+        self.vectorcall_exact_only = False
+        self.vectorcall_zero_arg = ''


I wonder if we should collect these in a "vectorcall config dataclass". The stuff in this file is already so cluttered with tons of class members and local variables.

I'm not sure it would help readability much for introducing a new to this bit of code pattern.

Would be really nice to refactor the decorator parsing -> argument functions (at_*), they are really repetative to me at the moment; would be nice to not have to do custom key=value parsing in the new at_vectorcall.

skirpichev · 2026-03-02T23:27:21Z

Multiply that speedup across the many AC implemented types without vectorcall construction and I definitely get excited.

Yes, I hope that can mitigate speed regression with Decimal's since v3.13. Edit: no, this doesn't help too much.

Overall not sure it's actually worth replacing the hand written int vectorcall with an AC generated version for.

In any case, this will happen on case-by-case basis (this pr should include minimal set of such examples). BTW, here my results for this pr + c14c173.

Benchmark code: cmaloney@ddcd3b6, run with --rigorous
Configure options: ./configure --enable-optimizations --with-lto --with-static-libpython
Host: 64-bit Debian (Trixie)
Compiler: GCC 14.2.0

Benchmark	ref-lto-pr-bench	patch-lto-pr-bench
list(tuple)	2.23 us	2.20 us: 1.01x faster
list(range)	9.69 us	9.65 us: 1.00x faster
float(int)	1.04 us	1.04 us: 1.00x faster
str()	557 ns	563 ns: 1.01x slower
str(int)	2.16 us	2.13 us: 1.01x faster
str(bytes,enc)	2.19 us	2.21 us: 1.01x slower
bytes()	832 ns	910 ns: 1.09x slower
bytes(int)	2.59 us	2.58 us: 1.00x faster
bytearray(bytes)	3.46 us	3.44 us: 1.01x faster
bytearray(int)	1.73 us	1.76 us: 1.02x slower
tuple()	541 ns	531 ns: 1.02x faster
int(str)	1.35 us	1.37 us: 1.02x slower
int(str,base)	1.78 us	1.77 us: 1.01x faster
enumerate(list,start)	2.70 us	2.79 us: 1.03x slower
Geometric mean	(ref)	1.00x slower

Benchmark hidden because not significant (9): list(), list_subclass, float(), float(str), bytearray(), tuple(list), int(), reversed(list), enumerate(list)

With default ./configure:

Benchmark	ref-pr-bench	patch-pr-bench
list(tuple)	2.74 us	2.66 us: 1.03x faster
list_subclass	4.42 us	4.22 us: 1.05x faster
float(str)	2.61 us	2.57 us: 1.02x faster
str(int)	3.05 us	2.98 us: 1.02x faster
bytes()	991 ns	992 ns: 1.00x slower
bytes(int)	3.08 us	3.13 us: 1.02x slower
bytearray()	1.93 us	1.88 us: 1.03x faster
bytearray(bytes)	4.52 us	4.24 us: 1.07x faster
bytearray(int)	2.20 us	2.08 us: 1.06x faster
tuple()	559 ns	544 ns: 1.03x faster
int()	556 ns	538 ns: 1.03x faster
int(str)	1.80 us	1.78 us: 1.01x faster
int(str,base)	2.44 us	2.45 us: 1.00x slower
enumerate(list)	2.96 us	2.97 us: 1.00x slower
enumerate(list,start)	3.32 us	3.27 us: 1.01x faster
Geometric mean	(ref)	1.01x faster

BTW, I wonder how noisy your benchmarks, here an alternative approach with bench_func().

# vectorcall-bench.py

import pyperf

runner = pyperf.Runner()
bench_cases = ['1<<7', '1<<38', '1<<300', '1<<3000']

for c in bench_cases:  # XXX: bigger sample
    i = eval(c)
    bn = f'int({c})'
    runner.bench_func(bn, int, i)
for c in bench_cases:
    i = eval(c)
    s = str(i)
    bn = f'int({c!r})'
    runner.bench_func(bn, int, s)

As before, all optimizations:

Benchmark	ref-lto	patch-lto
int(1<<7)	122 ns	125 ns: 1.03x slower
int(1<<38)	123 ns	126 ns: 1.02x slower
int(1<<300)	123 ns	126 ns: 1.02x slower
int(1<<3000)	123 ns	126 ns: 1.02x slower
int('1<<7')	197 ns	199 ns: 1.01x slower
int('1<<38')	319 ns	301 ns: 1.06x faster
int('1<<300')	922 ns	902 ns: 1.02x faster
int('1<<3000')	18.5 us	18.5 us: 1.00x faster
Geometric mean	(ref)	1.00x slower

Default:

Benchmark	ref	patch
int(1<<7)	121 ns	128 ns: 1.05x slower
int(1<<38)	124 ns	129 ns: 1.04x slower
int(1<<300)	123 ns	129 ns: 1.06x slower
int(1<<3000)	123 ns	129 ns: 1.05x slower
int('1<<7')	229 ns	246 ns: 1.08x slower
int('1<<38')	364 ns	368 ns: 1.01x slower
Geometric mean	(ref)	1.04x slower

I think correctness of generated code, maintainability of the new decorator, and providing a speedup for types with no vectorcall today is a lot of benefit even if it's not quite as fast as hand written expert code.

I agreed. But if auto-generated code catch major patterns for current hand-written functions - it will be great.

I can also run on my draft branch

No, I don't think it does make much sense with a lot of conversions to AC magic in one shot.

cmaloney · 2026-03-04T05:25:38Z

Did you consider adding this implicitly if supported, instead of making it opt-in? Disclaimer: I didn't take a look at the implementation yet.

My leaning is explicit at least to start. I think that provides a good path for gradual adoption / testing / rollout (hopefully in the 3.15 timeframe). I'd really like at least an alpha which reaches wider testing with a couple common types (ex. bytes) moved to make sure there aren't unanticipated tradeoffs or issues.

vstinner

Would it be possible to add a @vectorcall test to Modules/_testclinic.c?

vstinner · 2026-03-13T12:40:56Z

This PR is very promising! Great work.

cmaloney · 2026-03-18T05:56:51Z

Added a vectorcall test to the _testclinic module and a new VectorcallFunctionalTest which exercises the emitted __init__ and __new__

Debating if a hypothesis test of "does this parse the right args + kwargs" would provide value for the complexity.

cmaloney · 2026-03-25T04:18:21Z

Full pyperformance numbers on the current PR below. No cases stand out to me / all seems to be within noise for my machine. That is enumerate, reversed, and tuple don't seem to change performance going from hand written to the auto-generated as expected.

Not sure what will make this easier to review. I am happy to resolve the merge conflict anytime but also don't want to disturb in-process reviews. I am thinking of scoping this down to just _testclinic + the core implementation to reduce risk. Then there should be no performance (or behavior) changes in existing code as this just adds support and tests to AC but no user visible usage. With that setup can add to one type at a time in very focused PRs with micro benchmarks + pyperformance.

pyperformance comparison: `3484ef6` (just before) vs HEAD (vectorcall clinic)

Platform: Linux-6.19.9-arch1-1-x86_64-with-glibc2.43 | 32 logical CPUs
Baseline: 3484ef60 (2026-03-21 19:46–20:31) | Changed: HEAD 3698a32d (2026-03-21 20:34–21:20)

Benchmark	Baseline	HEAD	Change	Significance
2to3	136 ms	135 ms	1.01x faster	Not significant
async_generators	195 ms	193 ms	1.01x faster	Not significant
async_tree_cpu_io_mixed	273 ms	275 ms	1.01x slower	Not significant
async_tree_cpu_io_mixed_tg	279 ms	278 ms	1.00x faster	Not significant
async_tree_eager	50.3 ms	50.1 ms	1.00x faster	Not significant
async_tree_eager_cpu_io_mixed	217 ms	218 ms	1.00x slower	Not significant
async_tree_eager_cpu_io_mixed_tg	255 ms	255 ms	1.00x slower	Not significant
async_tree_eager_io	337 ms	331 ms	1.02x faster	Not significant
async_tree_eager_io_tg	358 ms	356 ms	1.00x faster	Not significant
async_tree_eager_memoization	123 ms	123 ms	1.00x slower	Not significant
async_tree_eager_memoization_tg	166 ms	165 ms	1.01x faster	Not significant
async_tree_eager_tg	120 ms	122 ms	1.01x slower	Not significant
async_tree_io	347 ms	356 ms	1.03x slower	Not significant
async_tree_io_tg	358 ms	361 ms	1.01x slower	Not significant
async_tree_memoization	180 ms	180 ms	1.00x slower	Not significant
async_tree_memoization_tg	187 ms	189 ms	1.01x slower	Not significant
async_tree_none	146 ms	145 ms	1.00x faster	Not significant
async_tree_none_tg	147 ms	150 ms	1.02x slower	Significant (t=-2.03)
asyncio_tcp	162 ms	162 ms	1.00x faster	Not significant
asyncio_tcp_ssl	573 ms	574 ms	1.00x slower	Not significant
asyncio_websockets	344 ms	343 ms	1.00x faster	Not significant
bench_mp_pool	6.27 ms	6.30 ms	1.00x slower	Not significant
bench_thread_pool	835 us	840 us	1.01x slower	Not significant
bpe_tokeniser	2.10 sec	2.10 sec	1.00x slower	Not significant
chameleon	6.97 ms	6.96 ms	1.00x faster	Not significant
chaos	27.0 ms	26.6 ms	1.02x faster	Not significant
comprehensions	7.35 us	7.30 us	1.01x faster	Not significant
connected_components	308 ms	311 ms	1.01x slower	Not significant
coroutines	12.0 ms	12.1 ms	1.01x slower	Not significant
coverage	33.2 ms	33.6 ms	1.01x slower	Not significant
create_gc_cycles	1.18 ms	1.17 ms	1.01x faster	Not significant
crypto_pyaes	35.0 ms	34.9 ms	1.00x faster	Not significant
dask	417 ms	418 ms	1.00x slower	Not significant
deepcopy	109 us	108 us	1.00x faster	Not significant
deepcopy_memo	12.6 us	12.6 us	1.00x slower	Not significant
deepcopy_reduce	1.28 us	1.29 us	1.01x slower	Not significant
deltablue	1.58 ms	1.57 ms	1.01x faster	Not significant
django_template	17.4 ms	17.4 ms	1.00x slower	Not significant
docutils	1.18 sec	1.18 sec	1.00x faster	Not significant
dulwich_log	19.5 ms	19.4 ms	1.00x faster	Not significant
fannkuch	180 ms	178 ms	1.01x faster	Not significant
float	34.7 ms	34.4 ms	1.01x faster	Not significant
gc_traversal	3.00 ms	2.84 ms	1.06x faster	Significant (t=7.11)
generators	14.3 ms	14.1 ms	1.02x faster	Not significant
genshi_text	11.1 ms	11.0 ms	1.00x faster	Not significant
genshi_xml	24.6 ms	24.6 ms	1.00x slower	Not significant
go	55.2 ms	54.3 ms	1.02x faster	Not significant
hexiom	2.83 ms	2.86 ms	1.01x slower	Not significant
html5lib	23.4 ms	23.4 ms	1.00x slower	Not significant
json_dumps	4.53 ms	4.48 ms	1.01x faster	Not significant
json_loads	11.5 us	11.4 us	1.01x faster	Not significant
k_core	1.38 sec	1.38 sec	1.00x faster	Not significant
logging_format	3.30 us	3.25 us	1.02x faster	Not significant
logging_silent	45.0 ns	45.0 ns	1.00x faster	Not significant
logging_simple	3.03 us	3.01 us	1.01x faster	Not significant
mako	6.18 ms	6.29 ms	1.02x slower	Not significant
many_optionals	347 us	343 us	1.01x faster	Not significant
mdp	570 ms	566 ms	1.01x faster	Not significant
meteor_contest	48.1 ms	47.8 ms	1.01x faster	Not significant
nbody	47.1 ms	46.4 ms	1.01x faster	Not significant
nqueens	39.8 ms	39.9 ms	1.00x slower	Not significant
pathlib	9.32 ms	9.41 ms	1.01x slower	Not significant
pickle	5.72 us	5.67 us	1.01x faster	Not significant
pickle_dict	12.4 us	12.4 us	1.00x slower	Not significant
pickle_list	1.92 us	1.87 us	1.03x faster	Significant (t=7.32)
pickle_pure_python	144 us	143 us	1.01x faster	Not significant
pidigits	112 ms	112 ms	1.00x slower	Not significant
pprint_pformat	722 ms	722 ms	1.00x slower	Not significant
pprint_safe_repr	357 ms	356 ms	1.00x faster	Not significant
pyflate	210 ms	212 ms	1.01x slower	Not significant
python_startup	7.63 ms	7.65 ms	1.00x slower	Not significant
python_startup_no_site	4.66 ms	4.66 ms	1.00x slower	Not significant
raytrace	125 ms	123 ms	1.02x faster	Not significant
regex_compile	49.5 ms	49.1 ms	1.01x faster	Not significant
regex_dna	94.0 ms	93.0 ms	1.01x faster	Not significant
regex_effbot	1.72 ms	1.58 ms	1.08x faster	Significant (t=19.11)
regex_v8	12.2 ms	11.6 ms	1.06x faster	Significant (t=8.31)
richards	20.9 ms	20.9 ms	1.00x slower	Not significant
richards_super	24.1 ms	24.3 ms	1.01x slower	Not significant
scimark_fft	154 ms	150 ms	1.02x faster	Significant (t=3.69)
scimark_lu	55.5 ms	54.4 ms	1.02x faster	Significant (t=4.09)
scimark_monte_carlo	31.2 ms	31.5 ms	1.01x slower	Not significant
scimark_sor	54.8 ms	54.8 ms	1.00x slower	Not significant
scimark_sparse_mat_mult	2.63 ms	2.58 ms	1.02x faster	Not significant
shortest_path	322 ms	324 ms	1.01x slower	Not significant
spectral_norm	44.8 ms	45.0 ms	1.00x slower	Not significant
sphinx	454 ms	454 ms	1.00x slower	Not significant
sqlalchemy_declarative	51.3 ms	52.5 ms	1.02x slower	Significant (t=-6.55)
sqlalchemy_imperative	5.07 ms	5.09 ms	1.00x slower	Not significant
sqlglot_v2_normalize	49.7 ms	49.4 ms	1.01x faster	Not significant
sqlglot_v2_optimize	24.2 ms	24.2 ms	1.00x slower	Not significant
sqlglot_v2_parse	565 us	559 us	1.01x faster	Not significant
sqlglot_v2_transpile	706 us	704 us	1.00x faster	Not significant
sqlite_synth	1.12 us	1.12 us	1.00x slower	Not significant
subparsers	4.96 ms	4.91 ms	1.01x faster	Not significant
sympy_expand	186 ms	184 ms	1.01x faster	Not significant
sympy_integrate	8.64 ms	8.64 ms	1.00x slower	Not significant
sympy_str	106 ms	105 ms	1.00x faster	Not significant
sympy_sum	56.4 ms	57.0 ms	1.01x slower	Not significant
telco	3.34 ms	3.44 ms	1.03x slower	Significant (t=-5.11)
tomli_loads	976 ms	981 ms	1.00x slower	Not significant
tornado_http	58.2 ms	58.2 ms	1.00x faster	Not significant
typing_runtime_protocols	73.5 us	74.9 us	1.02x slower	Not significant
unpack_sequence	24.6 ns	25.0 ns	1.02x slower	Not significant
unpickle	7.23 us	7.26 us	1.00x slower	Not significant
unpickle_list	2.16 us	2.24 us	1.03x slower	Significant (t=-8.00)
unpickle_pure_python	105 us	102 us	1.04x faster	Significant (t=12.01)
xdsl_constant_fold	17.2 ms	17.3 ms	1.01x slower	Not significant
xml_etree_generate	38.0 ms	38.6 ms	1.02x slower	Not significant
xml_etree_iterparse	45.4 ms	46.0 ms	1.01x slower	Not significant
xml_etree_parse	76.3 ms	78.3 ms	1.03x slower	Significant (t=-7.06)
xml_etree_process	28.1 ms	28.2 ms	1.00x slower	Not significant

kumaraditya303 · 2026-03-25T05:59:38Z

Objects/tupleobject.c

 tuple_subtype_new(PyTypeObject *type, PyObject *iterable);

 /*[clinic input]
+@vectorcall zero_arg=(PyObject*)&_Py_SINGLETON(tuple_empty)


I am not sure that it is worth adding things like zero_arg just for tuple, how about leaving the tuple code as is for now and focus on other classes which don't have such invariants?

cmaloney requested review from AA-Turner and erlend-aasland as code owners March 1, 2026 04:22

bedevere-app bot added the awaiting review label Mar 1, 2026

bedevere-app bot mentioned this pull request Mar 1, 2026

Generate vectorcall code to parse arguments using Argument Clinic #87613

Open

cmaloney changed the title ~~gh-87613: Argument Cliic @vectorcall decorator~~ gh-87613: Argument Clinic @vectorcall decorator Mar 1, 2026

cmaloney added performance Performance or resource usage and removed performance Performance or resource usage labels Mar 1, 2026

add blurb

a9d0d6f

cmaloney changed the title ~~gh-87613: Argument Clinic @vectorcall decorator~~ gh-87613: Argument Clinic vectorcall decorator Mar 1, 2026

corona10 reviewed Mar 1, 2026

View reviewed changes

cmaloney added 2 commits March 1, 2026 01:33

Move enumerate, reversed in enum.c to AC vectorcall

15f74e8

Move tuple to AC vectorcall

a243517

cmaloney commented Mar 1, 2026

View reviewed changes

corona10 self-assigned this Mar 2, 2026

skirpichev mentioned this pull request Mar 2, 2026

Speed regression in the decimal module due to using heap types #144650

Open

erlend-aasland reviewed Mar 2, 2026

View reviewed changes

vstinner reviewed Mar 13, 2026

View reviewed changes

Add _testclinic vectorcall sample + testing

3698a32

kumaraditya303 reviewed Mar 25, 2026

View reviewed changes

Uh oh!

Conversation

cmaloney commented Mar 1, 2026 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

corona10 left a comment

Choose a reason for hiding this comment

Uh oh!

cmaloney commented Mar 1, 2026

Uh oh!

cmaloney commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmaloney Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

cmaloney Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

corona10 commented Mar 2, 2026

Uh oh!

skirpichev commented Mar 2, 2026

Uh oh!

cmaloney commented Mar 2, 2026

Uh oh!

erlend-aasland commented Mar 2, 2026

Uh oh!

erlend-aasland Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

cmaloney Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

skirpichev commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmaloney commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

vstinner commented Mar 13, 2026

Uh oh!

cmaloney commented Mar 18, 2026

Uh oh!

cmaloney commented Mar 25, 2026

pyperformance comparison: 3484ef6 (just before) vs HEAD (vectorcall clinic)

Uh oh!

kumaraditya303 Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cmaloney commented Mar 1, 2026 •

edited by bedevere-app bot

Loading

cmaloney commented Mar 1, 2026 •

edited

Loading

cmaloney Mar 2, 2026 •

edited

Loading

skirpichev commented Mar 2, 2026 •

edited

Loading

cmaloney commented Mar 4, 2026 •

edited

Loading

pyperformance comparison: `3484ef6` (just before) vs HEAD (vectorcall clinic)