QUERY_SPEC.md v0.1 + v0.2 amendments (informed by PQG conformance matrix)#145
Merged
rdhyee merged 4 commits intoisamplesorg:mainfrom Apr 24, 2026
Merged
QUERY_SPEC.md v0.1 + v0.2 amendments (informed by PQG conformance matrix)#145rdhyee merged 4 commits intoisamplesorg:mainfrom
rdhyee merged 4 commits intoisamplesorg:mainfrom
Conversation
Substrate-neutral query contract spanning DuckDB-WASM (web), DuckDB/Ibis (Python), and Apache Solr (legacy). Names mirror the Solr schema vocabulary (authoritative precedent) with substrate-specific aliases provided in §5. Scope: - Canonical facet / filter dimensions (§2) - Abstract filter grammar (§3) - Full-text search semantics (§3.2, the 16-field Solr searchText target) - Sample-card projection (§4.2) - Substrate binding tables (§5) - Open questions for v0.2 (§7) Out of scope: PQG graph traversal (see QUERY_COMPARISON.md), bulk export, ingestion. Refs isamplesorg.github.io#138.
Amendments informed by isamplesorg/pqg#22 (conformance_matrix.md §4-§5), which audited which shipped parquet files actually carry which spec dimensions: 1. Rename `specimen` → `objectType` (§2.2). Every shipped parquet uses `object_type` / `hasSampleObjectType`; adopt the data-side name as canonical, keep `hasSpecimenCategory` as Solr alias. 2. Drop ghosts: `informalClassification` (§2.2) and `resultTimeRange` (§2.3) — both were in Solr but never migrated to any parquet. Also drop `time_range OVERLAPS` from §3.1 grammar and §5.3 Solr binding. 3. Add `thumbnailURL` to §2.1 as optional (ships in `wide` today for OpenContext only; moving to per-source sidecars — issue isamplesorg#131). 4. Update §5.1 `time BETWEEN` binding from "TBD" to real DuckDB cast: `TRY_CAST(result_time AS TIMESTAMP) BETWEEN t1 AND t2`. `result_time` IS in lite (as VARCHAR). 5. Document H3 column availability in §2.4: `wide_h3` and `h3_summary_res{4,6,8}` carry res 4/6/8; `lite` has res 8 only; plain `wide` / `narrow` carry no H3 columns. 6. Pick `tmodified` (INTEGER epoch) over `last_modified_time` (VARCHAR) for `sourceUpdatedTime` in §2.1; alias the VARCHAR as deprecated. 7. Bump version callout to v0.2. 8. §7 open questions: close Q2 (time filter in lite — now resolved); reframe Q1 around the new `objectType` naming. 9. Appendix B: reference conformance_matrix.md and SERIALIZATIONS.md (pqg#143) as companion documents. Refs isamplesorg/pqg#22, isamplesorg.github.io#138.
…NS link Two issues from Codex review: 1. **§2.4 callout wrong about h3_summary schema**: the previous text said the summary tier files carry `h3_res4`, `h3_res6`, `h3_res8`. They don't — they ship `h3_cell` (UBIGINT) + `resolution` (INTEGER) and filter by resolution. Corrected the callout and the §5.1 DuckDB binding row to show the actual form (`h3_cell IN (...) AND resolution = 6`). 2. **Appendix B wrong link target**: the SERIALIZATIONS.md reference pointed at `isamplesorg/pqg/pull/143`, but the catalog PR is `isamplesorg#143`. Fixed.
Codex round-2: §5.1 DuckDB binding claimed `source IN (…)` binds to `source IN (…) on wide / lite parquet`. Wrong for wide — it uses `n` (PQG convention), not `source`. The query as written fails with "Referenced column source not found". Updated the binding row to distinguish: wide / narrow: WHERE n IN (…) lite / sample_facets_v2: WHERE source IN (…) — alias already exposed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR (a) introduces
query-spec.qmdas the v0.1 baseline (written in a previous session but never committed) and (b) applies v0.2 amendments informed by the newly-landed PQG conformance matrix — the audit of which shipped parquet files actually carry which QUERY_SPEC dimensions.The two changes are split across two commits for reviewer sanity:
Add QUERY_SPEC.md v0.1 (draft)— the baseline (context).Apply QUERY_SPEC v0.2 amendments from PQG conformance matrix— the substantive changes.Reviewers should focus on commit 2; commit 1 is context.
v0.2 amendments (9)
All amendments trace to
conformance_matrix.md§5:specimen→objectType(§2.2). Every shipped parquet usesobject_type/hasSampleObjectType; adopt data-side name, keephasSpecimenCategoryas Solr alias.informalClassification(§2.2) andresultTimeRange(§2.3) — Solr-era remnants, never in any shipped parquet. Also droptime_range OVERLAPSfrom §3.1 grammar and §5.3 Solr binding.thumbnailURLto §2.1 (optional). Ships inwidefor OpenContext today; moving to per-source sidecars (issue #131).time BETWEENbinding from "TBD" toTRY_CAST(result_time AS TIMESTAMP) BETWEEN t1 AND t2.result_timeIS inlite(as VARCHAR).wide_h3has directh3_res4/6/8columns;h3_summary_res{4,6,8}tier files shiph3_cell+resolution(NOTh3_res{N}columns);litehash3_res8only; plainwide/narrowcarry no H3 columns.tmodified(INTEGER epoch) overlast_modified_time(VARCHAR) forsourceUpdatedTimein §2.1; alias VARCHAR as deprecated.objectTypenaming.conformance_matrix.mdandSERIALIZATIONS.md(#143).Links
Test plan
quarto render query-spec.qmdrenders cleanly (tables, callouts, cross-refs)🤖 Generated with Claude Code