-
Notifications
You must be signed in to change notification settings - Fork 101
Description
Motivation
Mergers spend a significant fraction of time (30%-70%) inside Builder's,
which includes rkyv serialization, Bloom filter construction, compression,
and writing batches to disk. This work can be done in parallel with other
work done by the merger thread, effectively splitting the merger into two
concurrent pipelined tasks:
- Input: read, decompress, deserialize, merge input batches
- Output: serialize, Bloom, compress, write.
The input task feeds (k, v, t, w) tuples to the output task.
This parallelization can potentially improve the pipeline performance in
scenarios where the merger becomes a bottleneck.
Design
This parallelization should be relatively easy to implement by providing a new
implementation of trait Builder (let's call it OffloadBuilder) that
forwards (k, v, t, w) tuples to the builder thread and retrieves the
constructed batch on done.
This offloading is not free and will introduce an extra deep copy of data
to a message queue (which may not be terrible, since our file-based cursors
require a deep copy anyway).
We need to figure out how to schedule the offloaded work. I see two options:
Option 1. Using threads
- Spawn another std::thread (OffloadThread) for each background thread.
- When an OffloadBuilder is created, it creates a new channel and sends a
closure that pulls (k,v,t,w) batches from the channel and does the actual
building, to the OffloadThread. - OffloadThread must run many of these closures in parallel as the background
thread context switches between multiple merge tasks. One way to do this is
to create a single threaded tokio runtime inside OffloadThread and have each
closure run as an async task within that runtime.
Option 2. Using tokio
This is a more radical change. Instead of running mergers in dedicated
background threads, use a multithreaded tokio runtime, where every merger
and every builder runs as a task.
Follow-up
As a follow-up we can further parallelize the builder by offloading Bloom
filter construction to another thread.