X Tutup
Skip to content

feat(compression): add MemorySlice comparator and support LookupStoreFactory for SST file#172

Open
lxy-9602 wants to merge 4 commits intoalibaba:mainfrom
lxy-9602:add-slice-comparator
Open

feat(compression): add MemorySlice comparator and support LookupStoreFactory for SST file#172
lxy-9602 wants to merge 4 commits intoalibaba:mainfrom
lxy-9602:add-slice-comparator

Conversation

@lxy-9602
Copy link
Collaborator

@lxy-9602 lxy-9602 commented Mar 10, 2026

Purpose

Linked issue: #93

  1. Add MemorySlice comparator - Implements a comparator for MemorySlice to enable ordered comparison of binary data.
  2. Support SortLookupStoreFactory for creating SST readers/writers.

Tests

RowCompactedSerializerTest.TestSliceComparator
SortLookupStoreTest

API and Format

Add options for lookup in def.h.

@lucasfang lucasfang requested a review from Copilot March 10, 2026 08:44
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds ordered binary comparison support (via MemorySlice/variant comparators) and introduces a LookupStoreFactory implementation backed by SST files, along with new lookup-cache related options (bloom filter, compression, and page size).

Changes:

  • Added Java-compatible floating-point ordering and variant-based comparators to support consistent ordering across components.
  • Introduced SST-backed lookup store factory (reader/writer) and updated SST reader/iterator APIs to return Result<> for error propagation.
  • Added new core options for lookup cache bloom filter, spill compression, and cache page size (plus corresponding tests).

Reviewed changes

Copilot reviewed 37 out of 37 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/paimon/core/utils/fields_comparator.h Adds floating-point ordering helper and variant comparator API.
src/paimon/core/utils/fields_comparator.cpp Implements variant comparator selection by Arrow type.
src/paimon/core/options/compress_options.h Introduces compression option struct used by lookup spill compression.
src/paimon/core/core_options_test.cpp Adds assertions and parsing coverage for new lookup/cache options.
src/paimon/core/core_options.h Exposes new lookup/cache option getters.
src/paimon/core/core_options.cpp Parses and stores lookup/cache bloom filter, compression, and page size options.
src/paimon/common/utils/bloom_filter.cpp Tightens integer types for consistency and avoids implicit narrowing.
src/paimon/common/sst/sst_file_writer.h Documents null bloom filter handle behavior when disabled.
src/paimon/common/sst/sst_file_writer.cpp Returns null bloom filter handle instead of failing when bloom filter is disabled.
src/paimon/common/sst/sst_file_utils.h Makes compression-type parsing fallible (Result<>) with clear error.
src/paimon/common/sst/sst_file_reader.h Refactors construction to use InputStream, upgrades Lookup/Seek APIs to Result<>, adds Close().
src/paimon/common/sst/sst_file_reader.cpp Propagates errors via Result<>, supports optional bloom filter handle, adds reader close path.
src/paimon/common/sst/sst_file_io_test.cpp Updates tests for new Result<> APIs and new reader creation signature.
src/paimon/common/sst/block_reader.h Makes BlockReader creation fallible and uses MemorySlice::SliceComparator.
src/paimon/common/sst/block_reader.cpp Updates creation path to return Result<> and modernizes comparator exposure.
src/paimon/common/sst/block_iterator.h Changes SeekTo to return Result<bool> and updates ctor signature.
src/paimon/common/sst/block_iterator.cpp Propagates comparator/positioning errors with Result<>.
src/paimon/common/sst/block_entry.h Renames fields and adjusts constructor to const-ref shared_ptrs.
src/paimon/common/sst/block_cache.h Adds explicit Close() to invalidate cached pages.
src/paimon/common/memory/memory_slice.h Adds SliceComparator alias and exposes underlying segment via GetSegment().
src/paimon/common/memory/memory_slice.cpp Implements GetSegment() and makes accessors const.
src/paimon/common/lookup/sort/sort_lookup_store_test.cpp Adds test coverage for SST-backed lookup store factory and new options.
src/paimon/common/lookup/sort/sort_lookup_store_factory.h Introduces SST-backed lookup store reader/writer and factory.
src/paimon/common/lookup/sort/sort_lookup_store_factory.cpp Implements reader/writer creation and close/flush flows.
src/paimon/common/lookup/lookup_store_factory.h Adds lookup store factory interfaces and bloom filter generator.
src/paimon/common/lookup/lookup_store_factory.cpp Creates SST-backed factory based on options; generates bloom filters per options.
src/paimon/common/io/cache/cache_manager.h Makes InvalidPage accept const-ref shared_ptr.
src/paimon/common/io/cache/cache_manager.cpp Updates InvalidPage implementation signature accordingly.
src/paimon/common/file_index/rangebitmap/dictionary/key_factory.cpp Reuses centralized floating-point ordering from FieldsComparator.
src/paimon/common/defs.cpp Registers new option keys for lookup cache and page size.
src/paimon/common/data/serializer/row_compacted_serializer_test.cpp Adds slice comparator tests across types, nulls, and float/double edge cases.
src/paimon/common/data/serializer/row_compacted_serializer.h Adds API to create a MemorySlice comparator for compacted rows.
src/paimon/common/data/serializer/row_compacted_serializer.cpp Implements slice comparator creation using field readers + variant comparators.
src/paimon/common/compression/block_compression_factory.h Adds Create(const CompressOptions&) overload.
src/paimon/common/compression/block_compression_factory.cpp Implements string-driven compression selection and improves error message formatting.
src/paimon/CMakeLists.txt Wires new lookup factory sources and test into build.
include/paimon/defs.h Documents new lookup cache and cache page size options in public API.
Comments suppressed due to low confidence (4)

src/paimon/core/utils/fields_comparator.cpp:1

  • arrow::Type::INT8 is being extracted as char, whose signedness is implementation-defined. This can break ordering for negative values on platforms where char is unsigned. Use int8_t consistently (and ensure the underlying VariantType stores INT8 as int8_t).
    src/paimon/core/utils/fields_comparator.cpp:1
  • FieldsComparator now provides CompareFloatingPoint() with Java-compatible ordering, but CompareField() for FLOAT/DOUBLE still documents (and likely implements) non-Java behavior. This can produce inconsistent ordering between InternalRow comparators and the new slice/variant comparators. Consider switching FLOAT/DOUBLE in CompareField() to use CompareFloatingPoint() and update/remove the TODO/comment accordingly.
    src/paimon/core/core_options.cpp:1
  • cache_page_size is stored as int64_t but exposed as int32_t and narrowed via static_cast. This risks overflow for legitimate configurations (e.g., cache sizes > 2GB) and makes the API misleading. Prefer returning int64_t (or size_t) from GetCachePageSize() and keep the type consistent end-to-end.
    src/paimon/core/core_options.cpp:1
  • cache_page_size is stored as int64_t but exposed as int32_t and narrowed via static_cast. This risks overflow for legitimate configurations (e.g., cache sizes > 2GB) and makes the API misleading. Prefer returning int64_t (or size_t) from GetCachePageSize() and keep the type consistent end-to-end.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

X Tutup