This directory contains microbenchmarks for PySpark using ASV (Airspeed Velocity).
Install ASV:
pip install asvFor running benchmarks with isolated environments (without --python=same), you need an environment manager.
The default configuration uses virtualenv, but ASV also supports conda, mamba, uv, and some others. See the official docs for details.
Run benchmarks using your current Python environment (fastest for development):
cd python/benchmarks
asv run --python=same --quickYou can also specify the test class to run:
cd python/benchmarks
asv run --python=same --quick -b 'bench_arrow.LongArrowToPandasBenchmark'Run benchmarks in an isolated virtualenv (builds pyspark from source):
cd python/benchmarks
asv run master^! # Run on latest master commit
asv run v3.5.0^! # Run on a specific tag
asv run abc123^! # Run on a specific commitCompare current branch against upstream/main with 10% threshold:
asv continuous -f 1.1 upstream/main HEADasv check # Validate benchmark syntaxBenchmarks are Python classes with methods prefixed by:
time_*- Measure execution timepeakmem_*- Measure peak memory usagemem_*- Measure memory usage of returned object
Example:
class MyBenchmark:
params = [[1000, 10000], ["option1", "option2"]]
param_names = ["n_rows", "option"]
def setup(self, n_rows, option):
# Called before each benchmark method
self.data = create_test_data(n_rows, option)
def time_my_operation(self, n_rows, option):
# Benchmark timing
process(self.data)
def peakmem_my_operation(self, n_rows, option):
# Benchmark peak memory
process(self.data)See ASV documentation for more details.