feat(datafusion): add TRUNCATE TABLE and DROP PARTITION SQL support#292
feat(datafusion): add TRUNCATE TABLE and DROP PARTITION SQL support#292JingsongLi merged 2 commits intoapache:mainfrom
Conversation
Wire existing TableCommit::truncate_table() and truncate_partitions() APIs to the DataFusion SQL layer, supporting: - TRUNCATE TABLE db.t - TRUNCATE TABLE db.t PARTITION (col = val, ...) - ALTER TABLE db.t DROP PARTITION (col = val, ...) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jerry-024
left a comment
There was a problem hiding this comment.
Three must-fix items, posted inline. The HIGH one (partial partition spec) is silent wrong-data deletion, not just a behavior nuance.
| } | ||
|
|
||
| let field = field_map.get(col_name.as_str()).ok_or_else(|| { | ||
| DataFusionError::Plan(format!("Column '{col_name}' not found in table schema")) |
There was a problem hiding this comment.
HIGH: silent wrong-partition deletion on incomplete specs.
This builds partition only from keys the user supplied. The map flows through TableCommit::truncate_partitions → partitions_to_bytes:
let datum = p.get(key).cloned().flatten(); // missing key → NoneNone is encoded into the partition filter as IS NULL. So on a (dt, region) partitioned table, DROP PARTITION (dt = '2026-04-28') becomes a filter dt='2026-04-28' AND region IS NULL — it does not delete all dt='2026-04-28' partitions like Hive/Spark prefix semantics would. If a NULL-region partition exists, it gets deleted (silent wrong-target). If not, silent no-op. Either way, user intent is mishandled with no error.
Two acceptable fixes:
- Reject incomplete specs explicitly: after the loop,
if partition.len() != partition_keys.len()→Errlisting the missing keys. - Implement true prefix-match deletion: requires a partition-filter primitive that matches on a subset of keys; not just a downstream-API tweak.
For reference, Java's Spark PaimonPartitionManagement.toPaimonPartitions does require(partitionFieldCount <= partitionKeys.length) and projects the partition row type so prefix specs work as Hive expects.
| .map_err(to_datafusion_error)?; | ||
|
|
||
| let wb = table.new_write_builder(); | ||
| let commit = wb.new_commit(); |
There was a problem hiding this comment.
MEDIUM: empty PARTITION () clause falls through to a full-table truncate.
If truncate.partitions is Some(empty_vec) (whatever AST shape sqlparser produces for a degenerate parser edge case), this branch is skipped and execution drops into commit.truncate_table() below — the entire table is wiped despite the user writing a PARTITION clause. Opposite of intent.
Suggest tightening:
if let Some(partitions) = &truncate.partitions {
if partitions.is_empty() {
return Err(DataFusionError::Plan(
"PARTITION clause requires at least one column = value".to_string(),
));
}
// ... existing branch
return ok_result(...);
}
commit.truncate_table().await?;This way only TRUNCATE TABLE t (no PARTITION clause at all, i.e. truncate.partitions is None) reaches truncate_table().
| } | ||
| } | ||
| // DropPartitions is a data operation (not a schema change), so we handle it | ||
| // separately and return early — it cannot be combined with schema changes. |
There was a problem hiding this comment.
MEDIUM: if_exists dropped at three levels — none honored.
-
Inner
AlterTableOperation::DropPartitions { if_exists: _ }(this match arm): partition-level IF EXISTS is explicitly bound and discarded. The doc comment claims this "matches IF EXISTS semantics" because the underlying overwrite is a no-op on missing partitions (verified —truncate_partitionsearly-returns on empty resolved entries). But that makes plainDROP PARTITIONbehave identically toDROP PARTITION IF EXISTS. Spark'sAlterTableDropPartitionCommanderrors when IF EXISTS is omitted and the partition doesn't exist — this PR diverges silently. -
Outer
Statement::AlterTable { if_exists, .. }: returning early from this branch intohandle_drop_partitionsskips whatever outer IF EXISTS handling the rest ofhandle_alter_tabledoes.ALTER TABLE IF EXISTS missing_table DROP PARTITION (...)will fail with a hard error fromcatalog.get_table(...)instead of being a silent no-op. -
Statement::Truncate { if_exists, .. }(handler at line 570): sqlparser parses the flag (confirmed insqlparser::ast::Truncate), buthandle_truncate_tablenever reads it.TRUNCATE TABLE IF EXISTS missing_tableerrors instead of no-oping.
Suggested fix: at each layer, read the flag, and on get_table returning NotFound (or by list_tables first), short-circuit to ok_result when if_exists is set, otherwise propagate the error.
Purpose
Wire existing TableCommit::truncate_table() and truncate_partitions() APIs to the DataFusion SQL layer, supporting:
Brief change log
Tests
API and Format
Documentation