X Tutup
Skip to content

[RFC][dbsp] Offload Builder to a separate thread #5754

@ryzhyk

Description

@ryzhyk

Motivation

Mergers spend a significant fraction of time (30%-70%) inside Builder's,
which includes rkyv serialization, Bloom filter construction, compression,
and writing batches to disk. This work can be done in parallel with other
work done by the merger thread, effectively splitting the merger into two
concurrent pipelined tasks:

  • Input: read, decompress, deserialize, merge input batches
  • Output: serialize, Bloom, compress, write.

The input task feeds (k, v, t, w) tuples to the output task.

This parallelization can potentially improve the pipeline performance in
scenarios where the merger becomes a bottleneck.

Design

This parallelization should be relatively easy to implement by providing a new
implementation of trait Builder (let's call it OffloadBuilder) that
forwards (k, v, t, w) tuples to the builder thread and retrieves the
constructed batch on done.

This offloading is not free and will introduce an extra deep copy of data
to a message queue (which may not be terrible, since our file-based cursors
require a deep copy anyway).

We need to figure out how to schedule the offloaded work. I see two options:

Option 1. Using threads

  • Spawn another std::thread (OffloadThread) for each background thread.
  • When an OffloadBuilder is created, it creates a new channel and sends a
    closure that pulls (k,v,t,w) batches from the channel and does the actual
    building, to the OffloadThread.
  • OffloadThread must run many of these closures in parallel as the background
    thread context switches between multiple merge tasks. One way to do this is
    to create a single threaded tokio runtime inside OffloadThread and have each
    closure run as an async task within that runtime.
Image

Option 2. Using tokio

This is a more radical change. Instead of running mergers in dedicated
background threads, use a multithreaded tokio runtime, where every merger
and every builder runs as a task.

Follow-up

As a follow-up we can further parallelize the builder by offloading Bloom
filter construction to another thread.

Metadata

Metadata

Assignees

Labels

DBSP coreRelated to the core DBSP libraryperformancestoragePersistence for internal state in DBSP operators

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    X Tutup