[feat] add lance by morningman · Pull Request #3564 · apache/doris-website

morningman · 2026-04-22T06:05:13Z

Versions

dev
4.x
3.x
2.1

Languages

Chinese
English

Docs Checklist

Checked by AI
Test Cases Built

tomz-alt · 2026-04-22T06:50:58Z

+## Limitations
+
+- **TVF only**: Only the `s3` and `local` TVFs are supported. `CREATE CATALOG` is not supported yet.
+- **Single data file per glob**: The `file_path` / `uri` must match exactly one `.lance` data file per dataset. If a glob matches multiple `.lance` files within the same multi-fragment dataset, each scan range will reopen the full dataset and produce duplicate rows.


The merged code (be/src/format/lance/lance_rust_reader.cpp:89-108) already extracts fragment_file from the scan-range path and passes it to the Rust side, which filters to exactly that
fragment. Glob-matching multiple .lance files in a multi-fragment dataset works correctly — no duplication. I verified this end-to-end locally: multi.lance/data/*.lance returned 15 rows
(correct), not 45.

tomz-alt · 2026-04-22T06:51:30Z

+    └── ...
+```
+
+When querying via TVF, the `uri` / `file_path` must point to a single `.lance` data file inside the `data/` subdirectory of the dataset. Doris automatically resolves the dataset root from this path and reads all fragments belonging to the dataset.


But the limitation says only one fragment is read if multiple are globbed. These can't both be true. With the merged fragment_file logic, the accurate statement is: each scan range reads
exactly one fragment; a glob over data/*.lance assigns one fragment to each scan range, producing the full dataset.

tomz-alt · 2026-04-22T06:52:38Z

+) ORDER BY id LIMIT 10;
+```
+
+### Aggregation over a Multi-Fragment Dataset


SELECT count(*), min(id), max(id) FROM s3(
"uri" = "s3://bucket/path/to/large.lance/data/fragment.lance",
...

data/fragment.lance points at one specific file — this reads one fragment, not the whole dataset. To show multi-fragment aggregation, use a glob:

"uri" = "s3://bucket/path/to/large.lance/data/*.lance",

Same for the local example — real Lance datasets have UUID-named fragment files, so pointing at one by name is fragile.

[feat] add lance

2782b23

morningman temporarily deployed to Production April 22, 2026 06:05 — with GitHub Actions Inactive

tomz-alt reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] add lance#3564

[feat] add lance#3564
morningman wants to merge 1 commit intoapache:masterfrom
morningman:v20260421

morningman commented Apr 22, 2026 •

edited

Loading

Uh oh!

tomz-alt Apr 22, 2026

Uh oh!

tomz-alt Apr 22, 2026

Uh oh!

tomz-alt Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

morningman commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Versions

Languages

Docs Checklist

Uh oh!

tomz-alt Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

tomz-alt Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

tomz-alt Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

morningman commented Apr 22, 2026 •

edited

Loading