Conversation
| ## Limitations | ||
|
|
||
| - **TVF only**: Only the `s3` and `local` TVFs are supported. `CREATE CATALOG` is not supported yet. | ||
| - **Single data file per glob**: The `file_path` / `uri` must match exactly one `.lance` data file per dataset. If a glob matches multiple `.lance` files within the same multi-fragment dataset, each scan range will reopen the full dataset and produce duplicate rows. |
There was a problem hiding this comment.
The merged code (be/src/format/lance/lance_rust_reader.cpp:89-108) already extracts fragment_file from the scan-range path and passes it to the Rust side, which filters to exactly that
fragment. Glob-matching multiple .lance files in a multi-fragment dataset works correctly — no duplication. I verified this end-to-end locally: multi.lance/data/*.lance returned 15 rows
(correct), not 45.
| └── ... | ||
| ``` | ||
|
|
||
| When querying via TVF, the `uri` / `file_path` must point to a single `.lance` data file inside the `data/` subdirectory of the dataset. Doris automatically resolves the dataset root from this path and reads all fragments belonging to the dataset. |
There was a problem hiding this comment.
But the limitation says only one fragment is read if multiple are globbed. These can't both be true. With the merged fragment_file logic, the accurate statement is: each scan range reads
exactly one fragment; a glob over data/*.lance assigns one fragment to each scan range, producing the full dataset.
| ) ORDER BY id LIMIT 10; | ||
| ``` | ||
|
|
||
| ### Aggregation over a Multi-Fragment Dataset |
There was a problem hiding this comment.
SELECT count(*), min(id), max(id) FROM s3(
"uri" = "s3://bucket/path/to/large.lance/data/fragment.lance",
...
data/fragment.lance points at one specific file — this reads one fragment, not the whole dataset. To show multi-fragment aggregation, use a glob:
"uri" = "s3://bucket/path/to/large.lance/data/*.lance",
Same for the local example — real Lance datasets have UUID-named fragment files, so pointing at one by name is fragile.
Versions
Languages
Docs Checklist