feat: support fixed-bucket partial-update merge engine#263
feat: support fixed-bucket partial-update merge engine#263QuakeWang wants to merge 5 commits intoapache:mainfrom
Conversation
JingsongLi
left a comment
There was a problem hiding this comment.
Can we simplify some of the code? There's no need for so much validation; we just need to focus on E2E and functionality.
…-groundwork # Conflicts: # crates/paimon/src/table/table_scan.rs # crates/paimon/src/table/table_write.rs
|
|
||
| impl MergeRow { | ||
| #[allow(dead_code)] | ||
| fn source_batch<'a>( |
There was a problem hiding this comment.
If it's test-only infrastructure, consider moving it to #[cfg(test)] or removing the allow and using it in production code. Dead code annotations tend to accumulate.
| let total_buckets = core_options.bucket(); | ||
| let has_primary_keys = !schema.primary_keys().is_empty(); | ||
| let table_name = table.identifier().full_name(); | ||
| let partial_update_mode = PartialUpdateConfig::new(schema.options()) |
There was a problem hiding this comment.
This validation should happened in KeyValueFileWriter.
And should new merge_function, validation should happened in new.
| pub(super) fn can_push_down_limit_hint_for_scan( | ||
| data_predicates: &[Predicate], | ||
| row_ranges: Option<&[RowRange]>, | ||
| partial_update_enabled: bool, |
There was a problem hiding this comment.
I cannot understand why here need to limit partial_update_enabled.
| MergeEngine::PartialUpdate => { | ||
| self.read_pk_partial_update(data_splits, &core_options) | ||
| } | ||
| MergeEngine::FirstRow => { |
There was a problem hiding this comment.
Do not modify first row. You can just judge Deduplicate and PartialUpdate to read_pk.
|
@JingsongLi PTAL, thanks |
Purpose
Linked issue: part of #256
This PR delivers a minimum usable
merge-engine=partial-updateimplementation for primary-key tables with fixed buckets.It shifts the earlier groundwork-heavy approach toward end-to-end functionality, while keeping only the validation that is still necessary for correctness.
This PR does not close #256 yet. Support for
bucket=-1partial-update, including cross-partition existing-key behavior, remains follow-up work.Brief change log
merge-engine=partial-updateend to end for primary-key tablesbucket=-1anddeletion-vectors.enabled=trueexplicitly unsupported in this PRTests
API and Format
Documentation