Skip to content

backup: S3 encoder for buckets, objects, and blob reassembly (Phase 0a)#718

Open
bootjp wants to merge 35 commits intofeat/backup-phase0a-dynamodbfrom
feat/backup-phase0a-s3
Open

backup: S3 encoder for buckets, objects, and blob reassembly (Phase 0a)#718
bootjp wants to merge 35 commits intofeat/backup-phase0a-dynamodbfrom
feat/backup-phase0a-s3

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 30, 2026

Summary

Stacked on top of #716 (DynamoDB) and #717 (s3keys.ParseBlobKey). Adds the S3 encoder for the Phase 0 logical-backup decoder — the most complex per-adapter piece because it must reassemble multipart object bodies from independent blob chunks.

Snapshot prefixes handled:

  • !s3|bucket|meta|<bucket>s3/<bucket>/_bucket.json
  • !s3|obj|head|<bucket><gen><object>s3/<bucket>/<object>.elastickv-meta.json (sidecar)
  • !s3|blob|<bucket><gen><object><uploadID><partNo><chunkNo>[<partVersion>] → spilled to scratch on arrival; concatenated into s3/<bucket>/<object> at Finalize in (partNo, partVersion, chunkNo) order
  • !s3|upload|meta|/!s3|upload|part|: excluded by default; opt in via --include-incomplete-uploads to emit under _incomplete_uploads/records.jsonl
  • !s3|bucket|gen|, !s3|gc|upload|, !s3route|: ignored

How body assembly works

Each blob chunk is written atomically to a per-(bucket, object) scratch directory as it arrives, registered under the (uploadID, partNo, chunkNo, partVersion) routing key. At Finalize, chunks for each object are sorted and concatenated into a single body file via tmp+rename. Body bytes are never held in memory — only the scratch path map. The scratch tree is removed on Finalize.

Reserved-suffix collision handling

A user object key ending in .elastickv-meta.json is rejected with ErrS3MetaSuffixCollision by default. WithRenameCollisions(true) appends .user-data to the body file and records the rename in s3/<bucket>/KEYMAP.jsonl with KindMetaCollision so the dump remains reversible. Per-bucket KEYMAP.jsonl is opened lazily and dropped on Finalize if no records were written.

Test plan

  • go test -race ./internal/backup/... — pass.
  • golangci-lint run ./internal/backup/... — clean.
  • 11 tests covering bucket-meta round-trip, multipart out-of-order chunk reassembly, orphan-chunks warning, sidecar collision (rejected/renamed), JSON-decode rejection on both paths, ignored-prefix no-op, --include-incomplete-uploads round-trip + default skip, versioned blob assembly.

Stacking

Base: feat/backup-phase0a-dynamodb (PR #716). Also depends on feat/s3keys-parse-blob-key (PR #717).

bootjp added 2 commits April 30, 2026 18:52
The Phase 0a logical-backup decoder (internal/backup) needs to route
each !s3|blob| record to its assembled object body without holding a
live cluster. Today the package only exports parsers for object
manifests and upload parts; blob keys are constructable via
BlobKey / VersionedBlobKey but not parseable.

ParseBlobKey decodes the 6-component form (BlobKey output) and the
7-component form (VersionedBlobKey output with partVersion != 0)
into all components: bucket, generation, object, uploadID, partNo,
chunkNo, partVersion. Truncated keys, malformed segments, and trailers
that aren't either zero bytes or exactly one u64 are rejected with
ok=false.

Implementation is split into parseBlobKeyHead (the 6-component head)
and parseOptionalPartVersion (the trailer) so cyclomatic complexity
stays under the package cap without a //nolint marker.

Tests cover the un-versioned round-trip, versioned round-trip,
partVersion=0 fallback to un-versioned shape (matching
VersionedBlobKey's documented behaviour), rejection of non-blob keys
(bucket meta, object manifest, upload part, junk), and rejection of
trailing-garbage keys.
Builds on PR #716 (DynamoDB) and PR #717 (s3keys.ParseBlobKey).
Adds the S3 encoder for the Phase 0 logical-backup decoder.

Snapshot prefixes handled:
- !s3|bucket|meta|<bucket> -> s3/<bucket>/_bucket.json (live
  s3BucketMeta projected into the dump-format s3PublicBucket; cluster-
  internal fields like CreatedAtHLC and Generation stripped).
- !s3|obj|head|<bucket><gen><object> -> s3/<bucket>/<object>.elastickv-meta.json
  (live s3ObjectManifest projected into the dump-format
  s3PublicManifest; UploadID and per-part chunk arrays stripped).
- !s3|blob|<bucket><gen><object><uploadID><partNo><chunkNo>[<partVersion>]:
  spilled to a per-(bucket, object) scratch directory as it arrives;
  assembled at Finalize by walking sortChunkKeys output (partNo,
  partVersion, chunkNo) and concatenating into the final body file
  via writeFileAtomic-style tmp+rename. Memory: O(num_objects); body
  bytes never held in memory.
- !s3|upload|meta|, !s3|upload|part|: excluded by default;
  --include-incomplete-uploads emits them as opaque {prefix, key,
  value} JSONL records under s3/<bucket>/_incomplete_uploads/.
- !s3|bucket|gen|, !s3|gc|upload|, !s3route|: HandleIgnored no-op so
  the master pipeline can dispatch all !s3|* prefixes uniformly.

Reserved-suffix collision:
- A user object key ending in .elastickv-meta.json is rejected with
  ErrS3MetaSuffixCollision by default; --rename-collisions appends
  .user-data and records the rename in s3/<bucket>/KEYMAP.jsonl with
  KindMetaCollision so the on-disk reverse map is sound.
- Per-bucket KEYMAP.jsonl is opened lazily and dropped if Finalize
  finds zero records (the spec's "omit empty file" rule).

Body assembly:
- Each blob chunk is written atomically to scratch as it arrives;
  the chunk path is registered in the object's chunkPaths map keyed
  by (uploadID, partNo, chunkNo, partVersion).
- Finalize sorts the chunk keys by (partNo, partVersion, chunkNo)
  and concatenates the matching scratch files into a single
  <bodyPath> via tmp+rename.
- The scratch tree is rm -rf'd on Finalize via deferred RemoveAll,
  so a successful run leaves nothing behind.

Tests: bucket meta + single-chunk object round-trip, multipart
ordering verification (chunks emitted out of order, body assembled
in part/chunk order), orphan-chunks warning, .elastickv-meta.json
collision rejection (default) and rename (with flag) including
KEYMAP entry, malformed JSON rejection on both manifest and bucket
paths, ignored-prefix no-op, --include-incomplete-uploads round-
trip, default no _incomplete_uploads dir, versioned blob assembly.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e6a12d4a-80b4-4fa6-8514-0a2d35478416

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/backup-phase0a-s3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an S3Encoder to process S3 backup records, including bucket metadata, object manifests, and blob chunks, with final assembly performed during the Finalize step. It also adds a ParseBlobKey utility for decoding S3 blob keys. Feedback focuses on critical improvements for data integrity and performance, such as filtering blobs by uploadID during assembly to avoid corruption from multiple upload attempts, optimizing file system operations by caching directory creation status and file handles, and correctly implementing the includeOrphans flag. Additionally, the reviewer advised against ignoring errors from resource cleanup methods.

Comment thread internal/backup/s3.go
Comment on lines +99 to +106
type s3ObjectState struct {
bucket string
generation uint64
object string
manifest *s3PublicManifest
// chunkPaths maps (uploadID, partNo, chunkNo) -> scratch path.
chunkPaths map[s3ChunkKey]string
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The s3ObjectState should store the uploadID from the object manifest to ensure that only blobs belonging to the active upload are assembled. Additionally, adding a dirCreated flag can optimize directory creation in HandleBlob.

type s3ObjectState struct {
	bucket     string
	generation uint64
	object     string
	uploadID   string
	dirCreated bool
	manifest   *s3PublicManifest
	// chunkPaths maps (uploadID, partNo, chunkNo) -> scratch path.
	chunkPaths map[s3ChunkKey]string
}

Comment thread internal/backup/s3.go Outdated
Comment on lines +314 to +318
jl, err := openJSONL(filepath.Join(dir, "records.jsonl"))
if err != nil {
return err
}
defer func() { _ = closeJSONL(jl) }()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Opening and closing the records.jsonl file for every incomplete upload record is inefficient. Consider lazily opening the file and caching the writer in s3BucketState to amortize the I/O cost. Furthermore, if openJSONL uses os.Create, this logic will incorrectly truncate the file on every call, preserving only the last record. Avoid silently dropping entries during serialization to ensure state consistency.

References
  1. Avoid silently dropping entries during serialization or normalization. If placeholder or incomplete entries are valid in some contexts, provide a specific, more lenient handling path for them to ensure they are preserved on disk and avoid state inconsistencies.

Comment thread internal/backup/s3.go Outdated
Comment on lines +472 to +476
func (st *s3ObjectState) attachManifestParts(_ string, _ []s3LivePart) {
// Intentionally empty: assembleObjectBody consumes the manifest
// directly via st.manifest at Finalize. Kept as a hook so the
// callsite reads symmetrically with HandleBlob.
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Capture the uploadID from the manifest so it can be used to filter blobs during assembly.

func (st *s3ObjectState) attachManifestParts(uploadID string, _ []s3LivePart) {
	st.uploadID = uploadID
}

Comment thread internal/backup/s3.go Outdated
Comment on lines +497 to +504
chunks := sortChunkKeys(obj.chunkPaths)
for _, k := range chunks {
path := obj.chunkPaths[k]
if err := appendFile(tmp, path); err != nil {
_ = tmp.Close()
return err
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Filter the chunks by the uploadID specified in the manifest. This prevents data corruption if the snapshot contains blobs from multiple upload attempts for the same object key. Additionally, ensure that errors from tmp.Close() are not silently ignored.

	chunks := sortChunkKeys(obj.chunkPaths)
	for _, k := range chunks {
		if k.uploadID != obj.uploadID {
			continue
		}
		path := obj.chunkPaths[k]
		if err := appendFile(tmp, path); err != nil {
			if closeErr := tmp.Close(); closeErr != nil {
				// Log the error as per repository rules
			}
			return err
		}
	}
References
  1. Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.

Comment thread internal/backup/s3.go Outdated
Comment on lines +282 to +285
dir := filepath.Join(s.scratchRoot, EncodeSegment([]byte(bucket)), EncodeSegment([]byte(object)))
if err := os.MkdirAll(dir, 0o755); err != nil { //nolint:mnd // 0755 == standard dir mode
return errors.WithStack(err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling os.MkdirAll for every blob chunk is inefficient. Use the dirCreated flag in s3ObjectState to perform this operation only once per object.

	if !st.dirCreated {
		dir := filepath.Join(s.scratchRoot, EncodeSegment([]byte(bucket)), EncodeSegment([]byte(object)))
		if err := os.MkdirAll(dir, 0o755); err != nil { //nolint:mnd // 0755 == standard dir mode
			return errors.WithStack(err)
		}
		st.dirCreated = true
	}

Comment thread internal/backup/s3.go Outdated
if err != nil {
return err
}
defer func() { _ = closeJSONL(jl) }()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error from closeJSONL is silently ignored. For a writable resource, failing to close properly might mean data was not flushed to disk. Per general rules, errors from Close() methods should not be silently ignored.

References
  1. Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.

Comment thread internal/backup/s3.go
Comment on lines +379 to +386
if obj.manifest == nil {
s.emitWarn("s3_orphan_chunks",
"bucket", b.name,
"object", obj.object,
"chunks", len(obj.chunkPaths),
"hint", "blob chunks present but no !s3|obj|head record matched")
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The includeOrphans flag is currently ignored. The implementation only emits a warning but does not surface the orphan chunks under _orphans/ as described in the documentation and the WithIncludeOrphans method.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 554dd42276

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/s3.go
// vs `path/to/sub`) are deferred until the master pipeline
// detects them across multiple manifests; this PR's per-object
// flush trusts the caller's collision detection.
return obj.object, "", nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Sanitize object keys before deriving output paths

resolveObjectFilename returns the raw S3 object key and flushObject then does filepath.Join(bucketDir, objectName), so object names like ../x or /tmp/x will escape the bucket directory and write outside the dump root. Because S3 object keys are user-controlled, this enables path traversal/arbitrary file overwrite during backup decode and can corrupt host files rather than just the export tree.

Useful? React with 👍 / 👎.

Comment thread internal/backup/s3.go Outdated
Comment on lines +497 to +500
chunks := sortChunkKeys(obj.chunkPaths)
for _, k := range chunks {
path := obj.chunkPaths[k]
if err := appendFile(tmp, path); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict body assembly to the manifest-selected chunks

assembleObjectBody concatenates every entry in obj.chunkPaths after sorting, but never filters by the manifest's upload_id/declared parts, so stale chunks from a different upload attempt for the same (bucket,generation,object) can be merged into the output body. This produces corrupted object bytes whenever old or concurrent multipart chunks are still present in the blob keyspace.

Useful? React with 👍 / 👎.

Comment thread internal/backup/s3.go Outdated
Comment on lines +314 to +318
jl, err := openJSONL(filepath.Join(dir, "records.jsonl"))
if err != nil {
return err
}
defer func() { _ = closeJSONL(jl) }()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Append incomplete-upload records instead of truncating

HandleIncompleteUpload reopens records.jsonl on every record via openJSONL(...), and that helper uses create/truncate semantics, so each call overwrites prior lines. With --include-incomplete-uploads, only the last upload-meta/part record survives per bucket, defeating the stated “one JSON line per record” behavior and losing forensic data.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 19:38
Three correctness bugs and several quality issues, all valid.

CRITICAL: path traversal (Codex P1 #425).
S3 object keys are user-controlled. A key like "../../../etc/passwd"
would, under the prior filepath.Join-and-write code, escape the
bucket directory and overwrite host files. Added safeJoinUnderRoot
which Cleans the joined path and asserts it stays rooted under the
bucket dir; ".." traversal is rejected with ErrS3MalformedKey.
Absolute paths (leading "/") are normalised under the bucket dir by
filepath.Join, which is the safest outcome. Tests:
TestS3_PathTraversalAttemptRejected,
TestS3_AbsolutePathObjectKeyConfinedUnderBucket.

CRITICAL: stale upload-id chunks merged into body (Codex P1 #500,
Gemini HIGH #106/#476/#504).
A snapshot mid-delete-and-recreate or mid-retry can carry blob
chunks for multiple upload attempts under the same (bucket, gen,
object). The prior assembleObjectBody concatenated every chunk
regardless of upload_id, producing corrupted bytes. Now:
- s3ObjectState gains uploadID; HandleObjectManifest sets it.
- New filterChunksForManifest takes the chunkPaths map and the
  manifest's uploadID, returns only matching chunks sorted by
  (partNo, partVersion, chunkNo). Stale-uploadID chunks never
  enter the assembled body.
Test: TestS3_StaleUploadIDChunksFilteredFromAssembledBody.

CRITICAL: incomplete-uploads file truncated per record (Codex P2
#318, Gemini HIGH+MEDIUM #318).
HandleIncompleteUpload re-opened records.jsonl on every call;
openJSONL uses os.Create which truncates. Only the last record
survived per bucket. Now:
- s3BucketState carries an incompleteUploadsJL *jsonlFile lazily
  opened on the first record and cached.
- flushBucket closes it and surfaces the error (was silently
  ignored).
Test: TestS3_IncompleteUploadsAppendsAcrossCalls (3 records, 3
lines on disk).

QUALITY:
- Gemini MEDIUM #285 (MkdirAll per blob): s3ObjectState gains a
  scratchDirCreated bool; HandleBlob runs MkdirAll once.
- Gemini MEDIUM #318 (closeJSONL error ignored): keymap and
  incomplete-uploads writers now surface close errors via
  closeBucketKeymap / explicit closeJSONL return-check.
- Gemini MEDIUM #386 (includeOrphans flag ignored): orphan chunks
  for objects without manifests now write to
  <bucket>/_orphans/<encoded-object>/<chunk>.bin under
  WithIncludeOrphans(true). Test:
  TestS3_OrphanChunksWrittenWhenIncludeOrphans.

Also removed the now-unused attachManifestParts placeholder and
sortChunkKeys helper; their logic moved into
filterChunksForManifest which combines uploadID filtering with the
sort.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

3 critical correctness bugs + 3 quality issues addressed in 19ae3289. All driven by reviewer findings.

Critical

Codex P1 #425 — Path traversal. A user-supplied key like ../../../etc/passwd would have escaped the bucket dir and overwritten host files. Added safeJoinUnderRoot which filepath.Cleans the joined path and asserts it stays rooted under the bucket dir; .. traversal returns ErrS3MalformedKey. Absolute paths (leading /) collapse safely under the bucket dir via filepath.Join, which I assert with TestS3_AbsolutePathObjectKeyConfinedUnderBucket.

Codex P1 #500 + Gemini HIGH #106/#476/#504 — Stale upload-id chunks. A snapshot mid-delete-and-recreate or mid-retry can carry blob chunks for multiple upload attempts under the same (bucket, gen, object). The prior assembleObjectBody concatenated every chunk regardless of upload_id → corrupted bytes. s3ObjectState.uploadID is now set by HandleObjectManifest; new filterChunksForManifest returns only matching chunks. TestS3_StaleUploadIDChunksFilteredFromAssembledBody.

Codex P2 #318 + Gemini HIGH+MEDIUM #318 — Truncated records.jsonl. HandleIncompleteUpload reopened the file on every call (os.Create → truncate); only the last record survived. Now s3BucketState.incompleteUploadsJL is opened lazily once per bucket, cached, closed in flushBucket (with error surfaced). TestS3_IncompleteUploadsAppendsAcrossCalls.

Quality

Gemini MEDIUM #285MkdirAll per blob. s3ObjectState.scratchDirCreated makes the call once per object.

Gemini MEDIUM #318 — closeJSONL error. closeBucketKeymap and the incompleteUploadsJL close path now both surface the error.

Gemini MEDIUM #386includeOrphans unimplemented. Orphan chunks (chunks present without matching manifest) now write to <bucket>/_orphans/<encoded-object>/<chunk>.bin under WithIncludeOrphans(true). TestS3_OrphanChunksWrittenWhenIncludeOrphans.

Removed the now-unused attachManifestParts placeholder; sortChunkKeys's logic moved into filterChunksForManifest.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 4m 18s —— View job


Review — 19ae3289 post-reviewer-feedback pass

  • Gather context and read changed files
  • Review path traversal fix (safeJoinUnderRoot)
  • Review stale upload-id chunk filtering (filterChunksForManifest)
  • Review records.jsonl truncation fix (lazy open + cached writer)
  • Review MkdirAll per-blob optimization (scratchDirCreated)
  • Review closeJSONL error surfacing
  • Review includeOrphans implementation
  • Check test coverage for each fix
  • Check for any remaining issues

All six reviewer-flagged defects are correctly addressed. Here's the per-fix verdict plus a few remaining observations.

Fix 1 — Path traversal: safeJoinUnderRoot

The guard (filepath.Clean + HasPrefix(joined, cleanRoot+sep)) correctly rejects ../../../etc/passwd-style keys. The test TestS3_PathTraversalAttemptRejected covers the canonical attack; TestS3_AbsolutePathObjectKeyConfinedUnderBucket documents the deliberate "absolute path → confined under bucket dir" semantics.

Minor — object key "." passes the guard. filepath.Join(cleanRoot, ".") cleans to cleanRoot, satisfying the joined == cleanRoot branch. flushObject would then attempt to write a body file at the bucket directory path, which the OS would reject with EISDIR. The error surfaces, so there's no silent data corruption — but the error message would point to os.Rename rather than key validation, making it hard to diagnose. Worth catching explicitly:

if rel == "." || rel == ".." {
    return "", errors.Wrapf(ErrS3MalformedKey, "object name %q is a directory alias", rel)
}

Fix 2 — Stale upload-ID chunks: filterChunksForManifest

st.uploadID is correctly captured from live.UploadID in HandleObjectManifest (line 288) and consumed in assembleObjectBody (line 589). The filter correctly rejects chunks whose uploadID doesn't match. Test TestS3_StaleUploadIDChunksFilteredFromAssembledBody covers the multi-attempt scenario.

Minor — empty uploadID bypass is silent. The escape hatch (manifestUploadID != "" && k.uploadID != manifestUploadID) is documented in the comment. However, if live.UploadID is empty (malformed snapshot or very old format), the filter is a no-op and all chunks merge regardless of upload attempt — potentially corrupting the output. Consider adding a structured warning when live.UploadID == "" in HandleObjectManifest:

if live.UploadID == "" {
    s.emitWarn("s3_manifest_missing_upload_id",
        "bucket", bucket, "object", object,
        "hint", "stale-chunk filtering disabled for this object")
}

Fix 3 — records.jsonl truncation: lazy open + cached writer ✅

The lazy-open pattern (b.incompleteUploadsJL == nil check in HandleIncompleteUpload) is correct: openJSONL uses os.Create (truncating), but since it's called exactly once per bucket and the file handle is cached, no subsequent call can truncate the file. The flushBucket path closes the writer and surfaces the error. Test TestS3_IncompleteUploadsAppendsAcrossCalls asserting lines == 3 would catch any regression.

Note: openJSONL also calls os.MkdirAll, but HandleIncompleteUpload already calls os.MkdirAll(dir, ...) on the parent before calling openJSONL. The double MkdirAll is harmless but redundant.

Fix 4 — MkdirAll per-blob: scratchDirCreated flag ✅

s3ObjectState.scratchDirCreated gates the MkdirAll call correctly. The scratch dir path is still recomputed on every HandleBlob call even after the gate is set — computing filepath.Join(...) is cheap, but caching dir on the struct would eliminate the repeated allocation and the three EncodeSegment calls. Not blocking, just worth noting for very blob-heavy objects.

Fix 5 — closeJSONL error surfacing ✅

Both closeBucketKeymap (surfacing b.keymap.Close() errors) and flushBucket (surfacing closeJSONL(b.incompleteUploadsJL) errors) now return errors from Close. The closeJSONL implementation correctly flushes the bufio.Writer before closing — flush errors take precedence over close errors.

Fix 6 — includeOrphans implementation ✅

Orphan chunks are written to <bucketDir>/_orphans/<EncodeSegment(obj.object)>/ as individual .bin files (named via blobScratchName). Using EncodeSegment here is correct: it percent-encodes /, .., NUL, etc., so the encoded object key produces a safe directory name. Test TestS3_OrphanChunksWrittenWhenIncludeOrphans confirms the directory is created.

Minor — orphan chunks read into memory. flushOrphanObject uses os.ReadFile(scratchPath) (entire chunk in RAM), whereas the main assembly path streams via appendFile/io.Copy. For a diagnostic recovery path this is acceptable, but a very large chunk would spike memory. Low priority.


Unrelated: s3keys.ParseBlobKey

parsedBlobHead correctly factors the 6-component head; parseOptionalPartVersion handles the exact-u64Bytes trailer for versioned keys and treats excess bytes as a parse failure. Tests cover unversioned, versioned, version=0 (falls back to unversioned), and non-blob key rejection.


Summary

The three Codex P1/P2 bugs (path traversal, stale-upload corruption, records truncation) and three Gemini items (MkdirAll per-blob, closeJSONL error, includeOrphans unimplemented) are all fixed and tested. The remaining observations are minor and none is blocking:

Item Severity Location
"." object key hits EISDIR instead of ErrS3MalformedKey Low s3.go:safeJoinUnderRoot
Silent bypass when manifest upload_id is empty Low s3.go:HandleObjectManifest
Scratch dir recomputed on every blob after gate set Negligible s3.go:HandleBlob
Orphan chunks loaded into memory via os.ReadFile Low s3.go:flushOrphanObject

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19ae3289ee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/s3.go
Comment on lines +616 to +619
if manifestUploadID != "" && k.uploadID != manifestUploadID {
continue
}
keys = append(keys, k)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Assemble only manifest-declared part versions

Fresh evidence: this revision now filters by upload_id, but filterChunksForManifest still appends every (partNo, partVersion, chunkNo) for that upload without checking the manifest’s declared parts. In S3, overwritten parts can leave older blob versions present until async cleanup runs, so this logic can concatenate stale part versions (or silently miss required chunks) and produce corrupted object bodies even when the manifest points to a single canonical part version.

Useful? React with 👍 / 👎.

Comment thread internal/backup/s3.go Outdated
Comment on lines +495 to +497
joined := filepath.Clean(filepath.Join(cleanRoot, rel))
rootSep := cleanRoot + string(filepath.Separator)
if joined != cleanRoot && !strings.HasPrefix(joined, rootSep) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid normalizing dot segments in object keys

safeJoinUnderRoot uses filepath.Clean(filepath.Join(...)), which collapses keys like a/../b to b and . to the bucket root. Because S3 object keys treat these bytes literally, this can make distinct keys collide/overwrite each other, and keys that clean to the root can cause finalize-time write errors instead of emitting the object data.

Useful? React with 👍 / 👎.

Comment thread internal/backup/s3.go
// flush trusts the caller's collision detection. Path-traversal
// sanitisation runs in safeJoinUnderRoot, downstream of this
// function, where the bucket-directory root is in scope.
return obj.object, "", nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent cross-generation objects from sharing one output path

Object state is tracked per (object,generation), but resolveObjectFilename drops generation and returns only obj.object, so two generations of the same bucket/object can write to the same output path and whichever map iteration runs last wins. Since old-generation keys are intentionally isolated rather than guaranteed absent, this creates nondeterministic exports when stale generations coexist in the snapshot.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 20:01
Three follow-up issues from Codex on top of round 1.

#619 Codex P1 -- assemble only manifest-declared parts.
filterChunksForManifest previously matched on uploadID only; the
manifest declares specific (partNo, partVersion) tuples, and S3's
overwrite-then-async-cleanup window can leave older partVersion
chunks present under the same (bucket, generation, object,
uploadID). Mixing them produces corrupted bytes.

s3ObjectState gains declaredParts map[s3PartKey]struct{}, populated
by HandleObjectManifest. filterChunksForManifest takes the set and
restricts emission to declared (partNo, partVersion) tuples. Test
TestS3_StalePartVersionExcludedFromAssembledBody asserts a stale
partVersion=7 cannot leak when the manifest declares partVersion=9.

#497 Codex P2 -- preserve dot segments.
safeJoinUnderRoot used filepath.Clean which collapses "a/../b" to
"b". S3 treats those bytes literally; "a/../b" and "b" are distinct
keys that would have silently merged into one output file. Now
explicitly rejects any object key whose segments are "." or ".."
with ErrS3MalformedKey. NUL bytes also rejected. Test
TestS3_DotSegmentObjectKeyRejected covers the four forms ("a/../b",
"a/./b", "..", ".").

#521 Codex P2 -- cross-generation collision.
s3BucketState gains activeGen captured from the bucket-meta record.
flushBucket suppresses objects whose generation differs from
activeGen (under --include-orphans, those flow to _orphans/; by
default they're dropped with an s3_stale_generation_objects
warning). Tests TestS3_StaleGenerationObjectExcluded.

flushBucket cyclomatic complexity stayed under the cap by extracting
flushBucketObjects.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

3 Codex follow-ups addressed in 92ee22ef:

P1 #619 — Assemble only manifest-declared parts. filterChunksForManifest previously matched on uploadID only; the manifest declares specific (partNo, partVersion) tuples, and S3's overwrite-then-async-cleanup window can leave older partVersion chunks present. Mixing them produced corrupted bytes. s3ObjectState.declaredParts is now populated from manifest.parts and used as a strict allow-list. TestS3_StalePartVersionExcludedFromAssembledBody.

P2 #497 — Preserve dot segments. safeJoinUnderRoot used filepath.Clean which collapsed a/../b to b. S3 treats those bytes literally; two distinct keys would have silently merged. Now rejects any object key whose path segments are . or .. with ErrS3MalformedKey. NUL bytes also rejected. TestS3_DotSegmentObjectKeyRejected.

P2 #521 — Cross-generation collision. s3BucketState.activeGen captures the bucket-meta record's generation. Objects from older incarnations of the same bucket name are now suppressed (with s3_stale_generation_objects warning) or routed to _orphans/ under --include-orphans. TestS3_StaleGenerationObjectExcluded.

flushBucket complexity kept under the cap by extracting flushBucketObjects.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 3m 49s —— View job


Claude finished @bootjp's task in 4m 18s


Review — 92ee22ef Codex follow-ups (round 2)


All three Codex findings are correctly addressed. Per-fix verdict follows.

Fix 1 — Manifest-declared part filtering: declaredParts allow-list ✅

s3PartKey{partNo, partVersion} is populated from live.Parts in HandleObjectManifest (line 315-317) and consumed in filterChunksForManifest (line 716-719). The dual-filter (uploadID first, then declaredParts) is correctly ordered. TestS3_StalePartVersionExcludedFromAssembledBody covers the canonical overwrite scenario: manifest declares partVersion=9, stale chunk at partVersion=7 is excluded, only "OKBYE" assembles.

Minor — empty Parts list creates a non-nil empty map. When a manifest JSON omits or nulls the parts array, live.Parts is nil and the for range loop over it produces make(map[s3PartKey]struct{}, 0) — a non-nil empty map. In filterChunksForManifest, declaredParts != nil fires, and since no keys match, every chunk is filtered → the assembled body is empty. For a zero-byte object (SizeBytes=0) that is correct. For a malformed manifest with SizeBytes>0 but empty parts it is silently wrong, but that is a pre-existing manifest-format invariant, not a regression here.

Note — nil vs non-nil backward-compat path. The comment correctly documents that nil declaredParts means "no filter" for tests that pre-date the feature. All production callers go through HandleObjectManifest, which always sets a non-nil map. The escape hatch is safe.

Fix 2 — Dot-segment key rejection ✅

strings.Split(rel, "/") followed by a switch "." / ".." guard (lines 577-583) correctly rejects the four test cases (a/../b, a/./b, .., .). The comment in safeJoinUnderRoot explains why rejection beats normalization (S3 treats dot-segment bytes literally; filepath.Clean would merge distinct keys).

After the guard, filepath.Join(cleanRoot, rel) — which calls filepath.Clean internally — can only perform harmless normalization (double-slash collapse, trailing-slash strip) since no ./.. segments remain. The residual joined != cleanRoot check still catches any remaining escape attempts. ✓

The "." object-key issue I flagged in the prior round is now resolved by the segment guard (strings.Split(".", "/") = ["."] → caught). ✓

Minor — empty segment from trailing slash ("a/") passes the guard. strings.Split("a/", "/") = ["a", ""]; the "" segment doesn't match . or ... filepath.Join normalizes a/a, so the output path is <bucketDir>/a — a valid file. This is fine behaviour (trailing-slash keys are legal in S3 and produce a file at the key's literal path), but worth being aware of: "a/" and "a" would collide at the same output file. That's a path-collision concern, documented as deferred to the master pipeline.

Fix 3 — Cross-generation collision: activeGen guard ✅

b.activeGen is captured from live.Generation in HandleBucketMeta (line 268). flushBucketObjects skips objects whose obj.generation != b.activeGen when b.activeGen != 0 (line 457). The b.activeGen == 0 bypass ("no bucket-meta seen yet") is correct: without a reference generation, all objects flush and the existing orphan-warning path covers them.

TestS3_StaleGenerationObjectExcluded correctly verifies: gen=7 live object appears, gen=6 stale object absent, s3_stale_generation_objects warning fired. ✓

The cyclomatic-complexity reduction via flushBucketObjects extraction is clean; the stale counter accumulates even when --include-orphans routes the object to flushOrphanObject, so the summary warning count is always accurate.

Minor — stale-gen objects with --include-orphans emit two warning types. flushOrphanObject emits s3_orphan_chunks; the parent then emits s3_stale_generation_objects via the stale counter. An operator with --include-orphans set will see both events for the same objects. The semantics are not wrong, but the dual emission could be confusing in log aggregation. A future improvement would be to suppress s3_orphan_chunks for the stale-gen routing path, but it's not blocking.

Minor — no test for --include-orphans + stale-gen routing. TestS3_StaleGenerationObjectExcluded only exercises the suppression path. There is no test verifying that stale-gen objects land under _orphans/ when --include-orphans is set. Low priority but worth adding.

Minor — generation=0 bucket identity. If a bucket's live Generation field is 0 (e.g., very old data or a schema that omits it), HandleBucketMeta sets b.activeGen = 0, which the guard treats as "no filter". All objects flush regardless of their generation. This is the correct safe fallback — no object is incorrectly suppressed — but it does mean generation-0 buckets cannot use the stale-object suppression path. Acceptable given the server presumably never writes generation=0 as an active bucket generation.


Remaining carry-overs from prior review

Item Status
Silent bypass when manifest.upload_id == "" Still present (by design — documented in comment)
Scratch dir recomputed after scratchDirCreated gate set Still present (negligible)
Orphan chunks loaded via os.ReadFile in flushOrphanObject Still present (low priority)

Summary

All three Codex P1/P2 findings from round 2 (declaredParts allow-list, dot-segment rejection, generation guard) are correctly implemented and tested. No new blocking issues found. The PR is ready to merge once the stacked bases land.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 92ee22efad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/s3.go
// flush trusts the caller's collision detection. Path-traversal
// sanitisation runs in safeJoinUnderRoot, downstream of this
// function, where the bucket-directory root is in scope.
return obj.object, "", nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle file-vs-directory S3 key collisions before flush

When a bucket contains both path/to and path/to/sub, resolveObjectFilename returns the raw object key for both objects, so flushObject later tries to materialize mutually incompatible filesystem paths (one as a file, one as a parent directory). On POSIX this causes MkdirAll/Rename failures during finalize, so valid S3 keysets can make the backup fail instead of applying the documented leaf-data rename + KEYMAP record strategy.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 20:17
Codex P1 #615: when a bucket contains both `path/to` and
`path/to/sub`, the natural-path mapping puts the shorter key at a
file path and the longer key requires that path to be a parent
directory. POSIX cannot represent both; the prior code would fail
with EEXIST/ENOTDIR at MkdirAll/Rename during finalize, breaking
the documented leaf-data-suffix rename strategy.

Pre-flush computeDirPrefixes() walks every active-gen object key
and accumulates each parent prefix (a key "a/b/c" contributes "a"
and "a/b"). flushBucketObjects checks whether each object's key is
itself a prefix consumed by another active key; if so,
flushObjectWithCollision is called with needsLeafDataRename=true,
which routes resolveObjectFilename onto the existing
S3LeafDataSuffix path. KEYMAP.jsonl records the rename with
KindS3LeafData so restore can reverse it.

flushObject (the no-collision wrapper) was inlined into
flushObjectWithCollision since it was unused after the call sites
moved to the explicit-flag form.

Test TestS3_FileVsDirectoryKeyCollisionGetsLeafDataRename emits
"path/to" and "path/to/sub", asserts:
  - "path/to/sub" lands at its natural path with body "CHILD"
  - "path/to" is renamed to "path/to.elastickv-leaf-data" with
    body "LEAF"
  - KEYMAP.jsonl carries one record with KindS3LeafData and
    original "path/to"
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

Codex P1 #615 addressed in 2c44292f. When a bucket holds both path/to and path/to/sub the natural-path mapping requires path/to to be a regular file AND a directory; POSIX can't, so the prior code failed with EEXIST/ENOTDIR at finalize.

flushBucketObjects now pre-computes the set of "directory prefixes" required by the union of active-gen object keys (object a/b/c contributes a and a/b). Each object whose own key is a member of that set is flagged for the documented leaf-data rename: path/topath/to.elastickv-leaf-data + KEYMAP.jsonl record with KindS3LeafData. The longer key flushes at its natural path unchanged.

TestS3_FileVsDirectoryKeyCollisionGetsLeafDataRename covers the round-trip end-to-end (both files exist, KEYMAP record has KindS3LeafData and original path/to).

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 3m 30s —— View job


Claude finished @bootjp's task

  • Gather context and read changed files
  • Review file-vs-directory collision fix (computeDirPrefixes + flushBucketObjects)
  • Review leaf-data rename path and KEYMAP record emission
  • Review test coverage (TestS3_FileVsDirectoryKeyCollisionGetsLeafDataRename)
  • Check for remaining issues from prior rounds

Review — 2c44292f file-vs-directory collision fix (round 4)

The Codex P1 #615 fix is correct and clean. Detailed per-component verdict follows.

Fix: computeDirPrefixes + flushBucketObjects

computeDirPrefixes (s3.go:489) builds the prefix set by walking every / position in each active-gen object key. For "a/b/c" it emits "a" and "a/b". Stale-gen objects are correctly excluded (the b.activeGen != 0 && obj.generation != b.activeGen guard matches the one in flushBucketObjects), so a stale object whose key is a prefix of an active object does not incorrectly trigger a rename.

flushBucketObjects (s3.go:447) pre-computes the set before the flush loop, then consults it per-object via needsLeafDataRename := dirPrefixes[obj.object]. Since the prefix set is fixed before iteration begins, the non-deterministic Go map iteration order cannot affect which objects get renamed — the result is deterministic regardless of flush order. ✓

Fix: resolveObjectFilename + leaf-data rename ✅

When needsLeafDataRename is true, resolveObjectFilename (s3.go:641) returns obj.object + S3LeafDataSuffix with kind KindS3LeafData. This goes through safeJoinUnderRoot, so the renamed filename also passes the dot-segment and NUL guard. ✓

The sidecar for the renamed object becomes path/to.elastickv-leaf-data.elastickv-meta.json — a distinct path from the child's sidecar path/to/sub.elastickv-meta.json. No naming conflict. ✓

The meta-suffix check comes before the leaf-data check in resolveObjectFilename. For a key that ends in .elastickv-meta.json (handled first) AND is a directory prefix, only the meta-suffix rename fires — the leaf-data POSIX conflict is not resolved for that specific key. This is a pathological double-collision and the resulting ENOTDIR error would surface correctly (no silent data loss).

Fix: KEYMAP recording ✅

recordKeymap (s3.go:658) is called with encodedFilename = "path/to.elastickv-leaf-data" and original = []byte("path/to"). The KEYMAP.jsonl entry records both the renamed filesystem path and the original S3 key, which is the information restore needs to reverse the rename. closeBucketKeymap still removes a zero-count KEYMAP file, which is safe since any leaf-data rename always writes at least one record. ✓

Test: TestS3_FileVsDirectoryKeyCollisionGetsLeafDataRename

The test (s3_test.go:523) covers:

  • Body of the longer key (path/to/sub) at its natural path ✓
  • Body of the shorter key (path/to) at the leaf-data path ✓
  • KEYMAP Kind == KindS3LeafData
  • KEYMAP Original == "path/to" (round-trip) ✓

One gap worth noting: readKeymapFirstRecord assumes the only KEYMAP record is the leaf-data one. For the 2-object scenario this is correct. A 3-object scenario ("a", "a/b", "a/b/c") would produce two KEYMAP records in non-deterministic order; readKeymapFirstRecord would be unreliable there. This isn't a current bug but is worth a future readKeymapAllRecords-style helper if multi-rename tests are added.

Minor observations

Severity Location
Renamed path could itself be a directory prefix for another user object (e.g., path/to, path/to/sub, path/to.elastickv-leaf-data/x): the renamed path/to writes path/to.elastickv-leaf-data as a file, then path/to.elastickv-leaf-data/x needs it as a directory → ENOTDIR. Error surfaces correctly but is opaque (no ErrS3MalformedKey mention). Low s3.go:flushObjectWithCollision
No test for 3-object multi-rename ("a", "a/b", "a/b/c") — the algorithm handles it but the test suite doesn't verify both KEYMAP records are emitted. Low s3_test.go
emitObjectAtGen fixture emits part_version as 0 (default); the new declaredParts filter for {partNo:1, partVersion:0} passes correctly via ParseBlobKey → unversioned branch (partVersion=0). The filter is exercised but at the same version as the "no filter" escape hatch. Negligible s3_test.go:emitObjectAtGen

Carry-overs from prior rounds

Item Status
Silent bypass when manifest.upload_id == "" Still present (by design, documented)
Scratch dir recomputed after scratchDirCreated gate set Still present (negligible)
Orphan chunks loaded via os.ReadFile in flushOrphanObject Still present (low priority)

The fix is correct and ready to merge once stacked bases land. None of the remaining observations are blocking.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c44292fba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/s3.go
Comment on lines +726 to +729
chunks := filterChunksForManifest(obj.chunkPaths, obj.uploadID, obj.declaredParts)
for _, k := range chunks {
path := obj.chunkPaths[k]
if err := appendFile(tmp, path); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate manifest chunk completeness before writing body

assembleObjectBody filters/sorts whatever chunk files happen to exist and writes them, but it never verifies that every manifest-declared chunk is present (e.g., via parts[].chunk_count/chunk_sizes). If one or more blob keys are missing in the snapshot (partial data, race, or corruption), finalize still succeeds and emits a truncated object body without any error signal, which silently corrupts the backup output.

Useful? React with 👍 / 👎.

Comment thread internal/backup/s3.go Outdated
Comment on lines +611 to +614
for _, seg := range strings.Split(rel, "/") {
switch seg {
case ".", "..":
return "", errors.Wrapf(ErrS3MalformedKey,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject empty slash segments in object keys

The path guard rejects . and .. segments, but it allows empty segments, and filepath.Join then normalizes keys like a/, a, and a//b onto the same filesystem path. Since S3 treats these keys as distinct, this causes collisions/last-write-wins overwrites during finalize for valid object names containing repeated or trailing / characters.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 20:32
…718, round 4)

Two Codex P1 follow-ups.

#729 -- chunk completeness check.
assembleObjectBody filtered + sorted whatever chunk files happened
to exist and wrote them, but never verified the manifest's
declared chunk_count was actually present. A partial / racy /
corrupted snapshot would silently emit a truncated body. Added
verifyChunkCompleteness which counts chunks per (partNo,
partVersion) and asserts the count matches manifest.parts[].chunk_count
AND the highest chunkNo equals chunk_count-1. Mismatch surfaces as
ErrS3IncompleteBlobChunks before any bytes are written. The
declaredParts map's value type changed from struct{} to
s3DeclaredPart{chunkCount} to carry the contract.

#614 -- empty slash segments.
safeJoinUnderRoot rejected `.` and `..` but allowed empty segments
("a//b", "a/", trailing "/"). filepath.Join collapses these, so
distinct S3 keys would silently overwrite each other at finalize.
The dot-segment guard now also rejects "" segments anywhere except
the leading position (where a leading "/" produces an initial
empty segment that filepath.Join handles safely under the
already-tested absolute-path-confined-under-bucket behaviour).

Tests: TestS3_IncompleteChunksRejected,
TestS3_EmptySlashSegmentsRejected (covers a//b, a/, /a//, x/).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

2 Codex P1 follow-ups addressed in ba33df8e:

#729 — Chunk completeness. assembleObjectBody previously appended whatever chunk files happened to exist; a partial/racy/corrupted snapshot would silently emit a truncated body. New verifyChunkCompleteness counts chunks per (partNo, partVersion) and asserts both the count equals manifest's chunk_count AND maxChunkNo + 1 == chunk_count. Mismatch returns ErrS3IncompleteBlobChunks before any bytes hit disk. declaredParts value type changed from struct{} to s3DeclaredPart{chunkCount}. Test TestS3_IncompleteChunksRejected (manifest declares 3 chunks, snapshot has chunkNo 0 + 2; chunk 1 missing → error).

#614 — Empty slash segments. safeJoinUnderRoot rejected . and .. but allowed "" segments ("a//b", "a/", trailing "/"). filepath.Join collapses these; distinct S3 keys would silently overwrite each other. Now rejects "" segments anywhere except the leading position (where the leading "/" → initial empty segment is handled safely by the already-tested absolute-path-confined-under-bucket behaviour). Test TestS3_EmptySlashSegmentsRejected covers "a//b", "a/", "/a//", "x/".

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 3m 43s —— View job


Review — ba33df8e round 5 (chunk completeness + empty slash segments)

  • Gather context and read changed files
  • Review chunk completeness validation (verifyChunkCompleteness)
  • Review empty slash segment rejection in safeJoinUnderRoot
  • Check test coverage for each fix
  • Check for remaining issues from prior rounds

Both Codex P1 findings are correctly addressed. Detailed verdict follows.

Fix 1 — verifyChunkCompleteness: dual invariant ✅

The function is called between filterChunksForManifest and the assembly loop (s3.go:803), so ErrS3IncompleteBlobChunks fires before any bytes reach the tmp file. The dual invariant — o.count == want.chunkCount and o.maxIndex + 1 == want.chunkCount — is correctly motivated by the comment: a snapshot with N duplicates of chunkNo=0 satisfies the count check alone but fails on maxIndex. Tracing the four interesting cases:

chunks for part declared count count check maxIndex check outcome
chunkNo=0, chunkNo=2 (missing 1) 3 2 ≠ 3 → fail error ✅
chunkNo=0 × 3 (dup of 0) 3 3 = 3 maxIndex+1=1 ≠ 3 → fail error ✅
chunkNo=0, chunkNo=1, chunkNo=2 3 3 = 3 maxIndex+1=3 = 3 pass ✅
no chunks N > 0 0 ≠ N → fail error ✅

The nil escape hatch is safe and correctly limits backward-compat scope to tests that pre-date the manifest-parts feature.

Minor — want.chunkCount == 0 bypass. The guard want.chunkCount > 0 skips validation for zero-chunk declared parts. If a snapshot contains stale chunks for a part declared with chunk_count=0, filterChunksForManifest passes them through (key is in declaredParts), but verifyChunkCompleteness skips the check — those bytes are silently assembled into the output. A chunk_count=0 part is pathological in the first place (an S3 part with no data) so this is an extreme edge case, not a regression.

Fix 2 — safeJoinUnderRoot empty segment rejection ✅

The guard now rejects "" at i != 0 (s3.go:688). The leading-"" carve-out for rel starting with / is correct: filepath.Join(cleanRoot, "/a/b") = filepath.Clean(cleanRoot + "/" + "/a/b") = cleanRoot/a/b on POSIX — the absolute-path is absorbed under the bucket dir, which is covered by TestS3_AbsolutePathObjectKeyConfinedUnderBucket from round 1. The residual boundary check (strings.HasPrefix(joined, rootSep)) still guards against any future join anomaly.

All four test cases traced:

key segs first "" at i result
"a//b" ["a","","b"] i=1 ErrS3MalformedKey
"a/" ["a",""] i=1 ErrS3MalformedKey
"/a//" ["","a","",""] i=2 ErrS3MalformedKey
"x/" ["x",""] i=1 ErrS3MalformedKey

The flushOrphanObject path is exempt (it uses EncodeSegment([]byte(obj.object)) for the directory name, not the raw key), so no gap there.

The error propagates cleanly: safeJoinUnderRootflushObjectWithCollisionflushBucketObjectsFinalize with no swallowing — TestS3_EmptySlashSegmentsRejected asserts errors.Is(err, ErrS3MalformedKey) on the Finalize call. ✓

Carry-overs from prior rounds

Item Status
Silent bypass when manifest.upload_id == "" Still present (by design, documented)
Scratch dir recomputed after scratchDirCreated gate set Still present (negligible)
Orphan chunks loaded via os.ReadFile in flushOrphanObject Still present (low priority)
No test for --include-orphans + stale-gen routing Still present (low priority)

Both P1 findings are correctly fixed and tested. No new blocking issues. The PR is ready to merge once the stacked bases (feat/backup-phase0a-dynamodb, feat/s3keys-parse-blob-key) land.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ba33df8e5a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/s3.go Outdated
Comment on lines +688 to +690
if i != 0 {
return "", errors.Wrapf(ErrS3MalformedKey,
"object name has empty path segment %q", rel)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject leading-slash keys to avoid object overwrite collisions

safeJoinUnderRoot intentionally permits an empty first segment, so object keys like /a are accepted and then normalized by filepath.Join to the same output path as a. Because S3 treats /a and a as distinct keys, a bucket containing both will silently collide and whichever object flushes last wins, corrupting backup output without any keymap record. This is triggered whenever a bucket has leading-slash keys alongside equivalent non-leading-slash keys.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 20:43
Codex P1 round 5 (commit 2f87b84):

`safeJoinUnderRoot` permitted an empty first segment so leading-slash
keys like `/a` were accepted and then normalised by `filepath.Join`
to the same output path as `a`. S3 treats `/a` and `a` as two
distinct keys (the literal byte '/' is part of the key), so a
bucket containing both produced last-flush-wins corruption with no
KEYMAP record. The "absolute path collapses safely under bucket
dir" comfort the previous behaviour leaned on was false comfort:
the collapse silently merged distinct user data.

Now empty segments are refused everywhere — leading, mid-path, and
trailing — with ErrS3MalformedKey. Operators with leading-slash
keys must rename them in S3 first.

The previous test `TestS3_AbsolutePathObjectKeyConfinedUnderBucket`
(which asserted the wrong behaviour) is replaced by
`TestS3_LeadingSlashObjectKeyRejected`.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 5 fix for Codex P1 on commit 2f87b84:

safeJoinUnderRoot previously permitted an empty first segment, so a leading-slash S3 key like /a would be accepted and then collapsed by filepath.Join onto the same output path as a. S3 treats /a and a as distinct keys (the literal byte / is part of the key), so a bucket containing both produced last-flush-wins corruption with no KEYMAP record.

Now empty segments are rejected everywhere — leading, mid-path, and trailing — with ErrS3MalformedKey. The previous test TestS3_AbsolutePathObjectKeyConfinedUnderBucket (which asserted the wrong-by-design "confine and merge" behaviour) is replaced by TestS3_LeadingSlashObjectKeyRejected. Operators with such keys must rename them in S3 first.

go test -race ./internal/backup/... and golangci-lint run clean.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 2m 5s —— View job


Review — 2febd42 round 5 (leading-slash rejection)

  • Gather context
  • Read safeJoinUnderRoot change in s3.go
  • Read replacement test TestS3_LeadingSlashObjectKeyRejected
  • Verify correctness of the fix and edge cases
  • Post final review

The fix is correct. Detailed verdict follows.

Fix: remove i != 0 leading-segment exemption ✅

The old guard (if i != 0 { return error }) exempted segs[0] == "" (the empty segment produced by a leading /), so /a was accepted and then filepath.Join(cleanRoot, "/a") normalised to cleanRoot/a — identical to filepath.Join(cleanRoot, "a"). S3 stores /a and a as two distinct keys, so a bucket containing both would silently overwrite whichever flushed first with no KEYMAP record.

The new guard drops the index check entirely: any "" segment — leading, mid-path, or trailing — returns ErrS3MalformedKey. That makes safeJoinUnderRoot fully symmetric with the round-4 dot-segment rejection, and the invariant is now clean: filepath.Join receives a rel string containing no ., .., empty, or NUL segments — only ordinary path components.

Interaction with the round-4 test TestS3_EmptySlashSegmentsRejected

The round-4 test covers ["a//b", "a/", "/a//", "x/"]. Under the old code, /a// was rejected because of the i=2 empty segment (not i=0), so that test would have passed even before this fix. The new test TestS3_LeadingSlashObjectKeyRejected correctly isolates the specifically broken case: "/etc/host-attack" = strings.Split produces ["", "etc", "host-attack"]; only segs[0] is "", which the old code exempted. Coverage is complementary. ✓

Test replacement ✅

TestS3_AbsolutePathObjectKeyConfinedUnderBucket asserted the wrong behavior (that /etc/host would be "safely confined" under the bucket dir). Replacing it with TestS3_LeadingSlashObjectKeyRejected asserting ErrS3MalformedKey is the right call — it locks down the fix against regression and removes the misleading documentation of a now-rejected design choice.

Minor: stale comment at s3.go:694–697

The comment was not updated alongside the code change:

// Use filepath.Join here — its only behavioural change vs. raw
// concatenation after the dot-segment guard above is normalising
// a leading "/" off `rel` (which is what we want: absolute-path
// keys collapse safely under bucketDir).

After this fix, no leading / can reach filepath.Join — it is caught by the empty-segment loop as segs[0] == "". The phrase "which is what we want: absolute-path keys collapse safely under bucketDir" now actively contradicts the design: the old "confine and merge" behaviour was removed precisely because it was wrong. The comment should drop that clause. Something like:

// Use filepath.Join here — its only behavioural change on the
// sanitised `rel` (no dot segments, no empty segments) is collapsing
// redundant separators; it cannot escape the root.

Not blocking, but a reader of the comment and the test will see a direct contradiction.

Minor: joined == cleanRoot branch is now dead code

With all empty, ., and .. segments rejected, filepath.Join(cleanRoot, rel) can never equal cleanRoot (a non-empty, validated rel always produces a strictly longer path). The joined != cleanRoot part of the boundary check:

if joined != cleanRoot && !strings.HasPrefix(joined, rootSep) {

is dead. The check is harmless defense-in-depth — strings.HasPrefix(joined, rootSep) is what does the actual work for any reachable input — but the dead branch could confuse a reader wondering when joined == cleanRoot could occur. Low priority.


Carry-overs from prior rounds

Item Status
Silent bypass when manifest.upload_id == "" Still present (by design, documented)
Scratch dir recomputed after scratchDirCreated gate Still present (negligible)
Orphan chunks loaded via os.ReadFile Still present (low priority)
No test for --include-orphans + stale-gen routing Still present (low priority)

The Codex P1 finding (leading-slash collision) is correctly fixed and tested. One stale comment worth cleaning up but nothing blocking.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2febd423ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/s3.go
Comment on lines +682 to +685
segs := strings.Split(rel, "/")
for _, seg := range segs {
switch seg {
case ".", "..":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat backslashes as separators during key sanitization

In safeJoinUnderRoot, validation only splits object keys on /, but filepath.Join treats \ as a path separator on Windows. That means keys like a\\..\\b bypass the dot-segment check and normalize to the same output path as b, causing distinct S3 keys to collide/overwrite silently when the decoder runs on Windows. This should validate path segments using OS-aware separators (or reject \ in keys) before joining.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19d33a6511

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/s3.go Outdated
Comment on lines +748 to +752
f, err := os.Create(path) //nolint:gosec // path composed from output root
if err != nil {
return errors.WithStack(err)
}
b.keymap = NewKeymapWriter(f)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Close KEYMAP file descriptors after writing

recordKeymap opens KEYMAP.jsonl with os.Create and only stores a KeymapWriter; later closeBucketKeymap calls KeymapWriter.Close(), which flushes buffers but does not close the underlying file descriptor. In dumps that produce keymaps for many buckets (e.g., repeated collision renames), descriptors accumulate until EMFILE and subsequent bucket flushes fail, leaving backup output incomplete.

Useful? React with 👍 / 👎.

Comment thread internal/backup/s3.go Outdated
func (s *S3Encoder) recordKeymap(b *s3BucketState, bucketDir, encodedFilename string, original []byte, kind string) error {
if b.keymap == nil {
path := filepath.Join(bucketDir, "KEYMAP.jsonl")
f, err := os.Create(path) //nolint:gosec // path composed from output root
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Open KEYMAP sidecar with nofollow protections

This path uses os.Create directly for KEYMAP.jsonl, which follows symlinks. If an attacker (or stale prior run) places a symlink at that location, finalize can truncate/overwrite an arbitrary target file when the first keymap record is written. Other backup sidecars already use openSidecarFile to prevent this class of overwrite; KEYMAP should use the same guarded open path.

Useful? React with 👍 / 👎.

bootjp added 3 commits April 30, 2026 21:27
Codex round 9 raised two issues on commit ab38eb0:

1. P1: closeBucketKeymap leaked file descriptors. recordKeymap
   stored only the *KeymapWriter; closeBucketKeymap called
   KeymapWriter.Close() which flushes the bufio buffer but does
   NOT close the underlying *os.File. A dump producing keymaps
   for many buckets accumulated descriptors until EMFILE, after
   which subsequent bucket flushes failed and the dump output
   was incomplete. Track the *os.File on s3BucketState and
   close it from closeBucketKeymap alongside the KeymapWriter
   flush.

2. P2: recordKeymap used os.Create for KEYMAP.jsonl, which
   follows symlinks and clobbers hard links. The redis encoder
   already routes through openSidecarFile for the same kind of
   sidecar; mirror that path so a stale prior run (or local
   adversary) cannot turn a missing KEYMAP into an
   arbitrary-write primitive against /etc/passwd or similar.

Test: TestS3_KeymapRefusesSymlinkAtFinalize pre-creates
KEYMAP.jsonl as a symlink to a bait file, drives a meta-suffix
rename (so recordKeymap fires), and asserts both that the
finalize returns the symlink-refusal error and that the bait
file is untouched.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 7 fixes for Codex round 9 on commit ab38eb0 (commit 46cb56fe):

  1. P1 — KEYMAP file descriptor leak. recordKeymap stored only the *KeymapWriter; closeBucketKeymap called KeymapWriter.Close() which flushes the bufio buffer but does NOT close the underlying *os.File. A dump producing keymaps for many buckets accumulated descriptors until EMFILE and subsequent flushes failed. Now s3BucketState tracks the *os.File and closeBucketKeymap closes it explicitly.

  2. P2 — KEYMAP open followed symlinks. recordKeymap was using os.Create directly, which follows symlinks and clobbers hard links. The redis encoder already routes through openSidecarFile (atomic O_NOFOLLOW + nlink check); s3 now uses the same helper. Test: TestS3_KeymapRefusesSymlinkAtFinalize pre-places a symlink at the KEYMAP path and asserts the bait file is untouched.

go test -race and golangci-lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 2m 29s —— View job


Review — 46cb56f round 7 (KEYMAP fd leak + symlink protection)

  • Gather context and read changed files
  • Review P1: KEYMAP file descriptor leak fix
  • Review P2: KEYMAP symlink protection via openSidecarFile
  • Check test coverage (TestS3_KeymapRefusesSymlinkAtFinalize)
  • Check for remaining issues from prior rounds

Both Codex round-9 findings are correctly addressed. Detailed verdict follows.

Fix 1 — KEYMAP file descriptor leak: keymapFile *os.File

s3BucketState now tracks the *os.File separately from the *KeymapWriter (s3.go:156). closeBucketKeymap (s3.go:578) calls both in sequence: b.keymap.Close() flushes the bufio buffer, b.keymapFile.Close() closes the OS descriptor.

The error-priority logic is correct: if flushErr != nil, it takes precedence regardless of closeErr; if only closeErr is set, it's wrapped and surfaced. This is consistent with how closeJSONL in redis_string.go:550 handles the same two-phase close (flush→close).

Minor — b.keymapFile != nil guard is always true when b.keymap != nil. Both fields are set together in recordKeymap (s3.go:771-772) and neither is reset to nil independently, so the guard at s3.go:584 is technically dead. Harmless defense-in-depth, but a future reader may wonder when keymapFile could be nil while keymap is non-nil. Not blocking.

Minor — os.Remove runs after both closes. When Count() == 0, the empty KEYMAP file is removed via _ = os.Remove(...) (s3.go:591). This happens after both keymap.Close() and keymapFile.Close(), which is the correct order (close-then-remove avoids removing a file with open descriptors). ✓

Fix 2 — Symlink protection: openSidecarFile in recordKeymap

recordKeymap (s3.go:767) now calls openSidecarFile instead of os.Create. This is the same helper used by openJSONL (redis_string.go:540), making the two S3 sidecar files consistent:

File Opener
KEYMAP.jsonl openSidecarFilethis fix
_incomplete_uploads/records.jsonl openSidecarFile (via openJSONL)

The Unix path uses O_NOFOLLOW atomically — no TOCTOU race. The Windows path uses Lstat-then-OpenFile with the documented acceptable residual window. Both produce the "refusing to overwrite symlink at %s" error string the test checks. ✓

return err (not errors.WithStack(err)) at s3.go:769 is correct: openSidecarFile already wraps the error via cockroachdberr.WithStack. ✓

Test: TestS3_KeymapRefusesSymlinkAtFinalize

The test (s3_test.go:292) correctly:

  1. Pre-creates bucketDir/KEYMAP.jsonl as a symlink to a bait file
  2. Triggers recordKeymap via WithRenameCollisions(true) + a .elastickv-meta.json key
  3. Asserts Finalize returns an error containing "refusing to overwrite symlink"
  4. Asserts os.ReadFile(bait) still contains "stay-out" — the bait was not clobbered ✓

The propagation path is clean: FinalizeflushBucketcloseBucketKeymap is called first (line 496), but recordKeymap fires earlier via flushObjectWithCollision → the error surfaces there and bubbles through flushBucketObjectsflushBucketFinalize. ✓

Stale comment (carry-over from rounds 5 and 6)

s3.go:706–709 still reads:

// Use filepath.Join here — its only behavioural change vs. raw
// concatenation after the dot-segment guard above is normalising
// a leading "/" off `rel` (which is what we want: absolute-path
// keys collapse safely under bucketDir).

Since round 5 (leading-slash rejection), no leading / can reach filepath.Join — it is caught as an empty segment by the guard. The phrase "which is what we want: absolute-path keys collapse safely" actively contradicts the current design. Still low priority, still not blocking.


Carry-overs from prior rounds

Item Status
Silent bypass when manifest.upload_id == "" Still present (by design, documented)
Scratch dir recomputed after scratchDirCreated gate Still present (negligible)
Orphan chunks loaded via os.ReadFile in flushOrphanObject Still present (low priority)
No test for --include-orphans + stale-gen routing Still present (low priority)
Stale comment at safeJoinUnderRoot re: absolute-path confinement Still present (low priority)

Both P1 and P2 findings are correctly fixed and tested. No new blocking issues. The PR is ready to merge once the stacked bases (feat/backup-phase0a-dynamodb, feat/s3keys-parse-blob-key) land.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1dc688429b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/s3.go
Comment on lines +751 to +753
if needsLeafDataRename {
return obj.object + S3LeafDataSuffix, KindS3LeafData, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent leaf-data rename from colliding with user keys

When needsLeafDataRename is true, this always rewrites "<object>" to "<object>.elastickv-leaf-data" without checking whether that target is already a real S3 object key in the same bucket. A valid bucket containing both path/to + path/to/sub and an existing path/to.elastickv-leaf-data will have two distinct objects mapped to one output path, so finalize becomes last-write-wins and backup data is silently lost.

Useful? React with 👍 / 👎.

Comment thread internal/backup/s3.go
Comment on lines +744 to +750
if strings.HasSuffix(obj.object, S3MetaSuffixReserved) {
if !s.renameCollisions {
return "", "", errors.Wrapf(ErrS3MetaSuffixCollision,
"bucket %q object %q", b.name, obj.object)
}
return obj.object + ".user-data", KindMetaCollision, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard meta-collision rename target against key conflicts

In rename-collision mode, objects ending with .elastickv-meta.json are rewritten to "<object>.user-data" unconditionally. If the bucket already contains a real key with that .user-data suffix, both objects are emitted to the same filesystem path and one overwrites the other, and KEYMAP cannot represent both originals, so restore cannot reconstruct the original keyset.

Useful? React with 👍 / 👎.

Comment thread internal/backup/s3.go
Comment on lines +360 to +369
st.manifest = &s3PublicManifest{
FormatVersion: 1,
ETag: live.ETag,
SizeBytes: live.SizeBytes,
ContentType: live.ContentType,
ContentEncoding: live.ContentEncoding,
CacheControl: live.CacheControl,
ContentDisposition: live.ContentDisposition,
UserMetadata: live.UserMetadata,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Populate sidecar last_modified from manifest timestamp

The decoded live manifest includes last_modified_hlc, and s3PublicManifest exposes a last_modified field, but this struct literal never sets it. As a result every exported .elastickv-meta.json drops object modification time metadata, reducing backup fidelity for tooling that relies on the sidecar to preserve HEAD-visible timestamps.

Useful? React with 👍 / 👎.

bootjp added 4 commits April 30, 2026 21:57
…, round 8)

Codex round 9 raised three issues on commit 1dc6884:

1. P1: Leaf-data rename collided with real user keys.
   `needsLeafDataRename` rewrote `<object>` to
   `<object>.elastickv-leaf-data` without checking whether a real
   key with that suffix already existed in the same bucket. Two
   distinct objects could be mapped to one filesystem path and
   finalize was last-flush-wins. resolveObjectFilename now consults
   a per-bucket set of active-gen object keys via
   computeActiveGenObjectKeys; if the rename target is a real key,
   surface ErrS3MetaSuffixCollision.

2. P1: Meta-suffix rename collided too. Same root cause:
   rename-collisions mode rewrote `<object>.elastickv-meta.json` to
   `<object>.elastickv-meta.json.user-data`. If that suffixed key
   was itself a real object, the rename target collided. The same
   set lookup now refuses the rename and surfaces
   ErrS3MetaSuffixCollision.

3. P2: last_modified was never populated. The decoded
   s3LiveManifest carried last_modified_hlc, but the
   s3PublicManifest projection silently dropped it, so every
   exported sidecar lost the HEAD-visible Last-Modified
   timestamp. formatHLCAsRFC3339Nano extracts the millisecond
   half (HLC >> 16, see kv/hlc.go) and renders RFC3339Nano UTC,
   matching S3 HEAD semantics. HLC zero maps to "" so omitempty
   drops the field rather than emitting "1970-01-01T00:00:00Z".

Tests:
  - TestS3_LeafDataRenameRejectsCollidingUserKey
  - TestS3_MetaSuffixRenameRejectsCollidingUserKey
  - TestS3_LastModifiedSidecarPopulated
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 8 fixes for Codex round 9 on commit 1dc6884 (commit 90d33feb):

  1. P1 — Leaf-data rename collided with real user keys. needsLeafDataRename rewrote <object> to <object>.elastickv-leaf-data without checking whether a real key with that suffix already existed in the same bucket. Two distinct objects mapped to one filesystem path → finalize was last-flush-wins. resolveObjectFilename now consults a per-bucket active-gen object-key set via computeActiveGenObjectKeys; rename targets that collide surface ErrS3MetaSuffixCollision. Test: TestS3_LeafDataRenameRejectsCollidingUserKey.

  2. P1 — Meta-suffix rename collided too. Same fix path: rename-collisions mode rewrites <object>.elastickv-meta.json to <object>.elastickv-meta.json.user-data; the same set lookup now refuses when that suffixed key is itself a real object. Test: TestS3_MetaSuffixRenameRejectsCollidingUserKey.

  3. P2 — last_modified never populated. The decoded s3LiveManifest carried last_modified_hlc, but s3PublicManifest silently dropped it. Added formatHLCAsRFC3339Nano which extracts the millisecond half (HLC >> 16, see kv/hlc.go) and renders RFC3339Nano UTC. HLC zero maps to "" so omitempty drops the field. Test: TestS3_LastModifiedSidecarPopulated.

go test -race and golangci-lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 2m 41s —— View job


Review — 90d33fe round 8 (rename-target collision + last_modified)

  • Gather context and read changed files
  • Review P1: leaf-data rename collision with real user keys (computeActiveGenObjectKeys)
  • Review P1: meta-suffix rename collision fix
  • Review P2: last_modified population via formatHLCAsRFC3339Nano
  • Check test coverage for each fix
  • Check for remaining issues from prior rounds

All three Codex round-9 findings are correctly addressed. Detailed per-fix verdict follows.

Fix 1 — Leaf-data rename target collision: computeActiveGenObjectKeys

computeActiveGenObjectKeys (s3.go:581) builds a map[string]bool of all active-gen object keys, applying the same b.activeGen != 0 && obj.generation != b.activeGen guard used in flushBucketObjects. The resulting set is passed through flushBucketObjectsflushObjectWithCollisionresolveObjectFilename.

In resolveObjectFilename (s3.go:809):

target := obj.object + S3LeafDataSuffix
if objectKeys[target] {
    return "", "", errors.Wrapf(ErrS3MetaSuffixCollision,
        "bucket %q object %q leaf-data rename target %q is also a real object key (rename in S3 first)",
        b.name, obj.object, target)
}

The check fires before any bytes reach disk; error propagates up through flushBucketObjectsFinalize. ✓

TestS3_LeafDataRenameRejectsCollidingUserKey (s3_test.go:297) covers the 3-object scenario: path/to (leaf-data candidate), path/to/sub (forces path/to into dir-prefix set), path/to.elastickv-leaf-data (the colliding real key). Finalize returns ErrS3MetaSuffixCollision. ✓

Fix 2 — Meta-suffix rename target collision ✅

Symmetric fix in the strings.HasSuffix(obj.object, S3MetaSuffixReserved) branch (s3.go:800):

target := obj.object + ".user-data"
if objectKeys[target] {
    return "", "", errors.Wrapf(ErrS3MetaSuffixCollision, ...)
}

TestS3_MetaSuffixRenameRejectsCollidingUserKey (s3_test.go:315) covers the 2-object scenario: evil.elastickv-meta.json + evil.elastickv-meta.json.user-data as a real key. Finalize returns ErrS3MetaSuffixCollision with WithRenameCollisions(true). ✓

Note — objectKeys includes orphan-only objects (those with chunks but no manifest). Orphans are in b.objects, and computeActiveGenObjectKeys includes their .object names. This is conservative-correct: if a rename target matches an orphan's key, the rename is refused rather than silently overwriting. The behavior is sound.

Minor — three separate O(n) scans over b.objects in flushBucketObjects. computeDirPrefixes, computeActiveGenObjectKeys, and the flush loop each iterate b.objects once. All three share identical activeGen guards. Merging them into a single pass would reduce constant factor but the correctness is unaffected. Not blocking.

Fix 3 — last_modified via formatHLCAsRFC3339Nano

formatHLCAsRFC3339Nano (s3.go:262):

func formatHLCAsRFC3339Nano(hlc uint64) string {
    if hlc == 0 { return "" }
    ms := int64(hlc >> hlcLogicalBitsForBackupS3)
    return time.UnixMilli(ms).UTC().Format(time.RFC3339Nano)
}

The 16-bit right-shift matches kv/hlc.go:hlcLogicalBits. HLC=0 → "" so omitempty drops the field (avoids emitting the misleading epoch timestamp). HandleObjectManifest wires it at s3.go:387. ✓

TestS3_LastModifiedSidecarPopulated (s3_test.go:332): constructs hlc = 1_745_884_800_000 << 16, pushes it through the encoder, reads the sidecar and asserts last_modified == time.UnixMilli(int64(ms)).UTC().Format(time.RFC3339Nano). Perfect round-trip coverage. ✓

Minor — comment placement (s3.go:250–262). The doc comment for formatHLCAsRFC3339Nano opens with:

// s3LiveManifest mirrors the live adapter/s3.go s3ObjectManifest
// just enough to decode the JSON value. ...
// formatHLCAsRFC3339Nano renders the millisecond half of an HLC

The first four lines describe s3LiveManifest (the struct defined at line 276), but because there is no blank-line separation, Go doc tools attach the entire block to formatHLCAsRFC3339Nano and s3LiveManifest is left undocumented. The function comment should start at "formatHLCAsRFC3339Nano renders..." with a blank line above it, and the s3LiveManifest comment should be moved directly above that type. Not blocking.

Minor — no test for hlc=0 → omitted field. The HLC=0 → "" path is exercised indirectly (every test that calls emitObject without a last_modified_hlc key gets a zero-valued field and the sidecar omits it), but no test explicitly asserts the last_modified key is absent when last_modified_hlc=0. Low priority.


Carry-overs from prior rounds

Item Status
Stale comment at safeJoinUnderRoot:763 re: "absolute-path keys collapse safely" Still present (low priority)
joined == cleanRoot dead branch at safeJoinUnderRoot:769 Still present (low priority)
Silent bypass when manifest.upload_id == "" By design (documented)
Scratch dir recomputed per-blob after scratchDirCreated gate Negligible
Orphan chunks loaded via os.ReadFile in flushOrphanObject Low priority
No test for --include-orphans + stale-gen routing Low priority

All three Codex round-9 findings are correctly fixed and tested. No new blocking issues. The PR is ready to merge once the stacked bases (feat/backup-phase0a-dynamodb, feat/s3keys-parse-blob-key) land.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63959370d2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/s3.go Outdated
return errors.Wrapf(ErrS3MalformedKey, "blob key: %q", key)
}
st := s.objectState(bucket, gen, object)
dir := filepath.Join(s.scratchRoot, EncodeSegment([]byte(bucket)), EncodeSegment([]byte(object)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Sanitize blob object key before building scratch path

HandleBlob writes chunk files under filepath.Join(s.scratchRoot, ..., EncodeSegment(object)), but EncodeSegment intentionally preserves / and .. for S3 keys, so keys like ../x or ../../tmp/pwn can escape the scratch subtree and write outside the decoder’s controlled directory before Finalize runs. The later safeJoinUnderRoot check only protects final object output paths, not this early spill-to-disk step, so a crafted snapshot/object key can still trigger arbitrary file writes on the host during decode.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 22:09
…d 9)

Codex P1 round 11 (commit 9a63e32):

`HandleBlob` composed scratch paths with `filepath.Join(s.scratchRoot,
EncodeSegment([]byte(bucket)), EncodeSegment([]byte(object)))`.
EncodeSegment uses the RFC3986 unreserved set
(ALPHA/DIGIT/-/./_) — `/` is percent-encoded but `.` is preserved,
so the literal segment `..` survives unchanged. A crafted
`bucket=".."` and/or `object=".."` would resolve to
`<scratchRoot>/../...`, letting `writeFileAtomic` land outside the
decoder's controlled directory before `safeJoinUnderRoot` runs at
output time.

Add `scratchDirForBlob` which rejects `.` / `..` / "" bucket and
object literals at the encoder boundary so the spill-to-disk
step inherits the same containment invariant the final output
path enforces. Apply the same guard to `flushOrphanObject` which
shared the failure mode under `--include-orphans`.

(Multi-segment dot keys like `a/../b` continue to be caught at
Finalize via `safeJoinUnderRoot` because EncodeSegment keeps the
whole key in one filename segment that splits cleanly there.)

Tests:
  - TestS3_HandleBlobRejectsScratchEscape: 5 sub-cases covering
    bucket/object/both variants of `.`/`..` literals.
  - TestS3_DotSegmentObjectKeyRejected updated to allow either
    HandleBlob or Finalize to surface ErrS3MalformedKey, since
    sole-dot keys are now caught earlier.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 9 fix for Codex P1 round 11 on commit 9a63e32 (commit d016ea71):

HandleBlob composed scratch paths with filepath.Join(scratchRoot, EncodeSegment(bucket), EncodeSegment(object)). EncodeSegment uses the RFC3986 unreserved set (ALPHA/DIGIT/-/./_) — / is percent-encoded but . is preserved, so the literal segment .. survives unchanged. A crafted bucket=".." and/or object=".." would resolve to <scratchRoot>/../..., letting writeFileAtomic land outside the decoder's controlled directory before safeJoinUnderRoot ever ran at output time.

Added scratchDirForBlob which rejects . / .. / "" bucket and object literals at the encoder boundary so the spill-to-disk step inherits the same containment invariant the final output path enforces. Same guard applied to flushOrphanObject (which shares the failure mode under --include-orphans). Multi-segment dot keys like a/../b continue to be caught at Finalize via safeJoinUnderRoot (EncodeSegment keeps the whole key as one filename segment).

Tests: TestS3_HandleBlobRejectsScratchEscape (5 sub-cases covering bucket/object/both variants of ./..); TestS3_DotSegmentObjectKeyRejected updated to accept either HandleBlob or Finalize as the rejection point. go test -race and golangci-lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 2m 53s —— View job


Review — d016ea71 round 9 (scratch-path escape via dot-segment bucket/object names)

  • Gather context and read changed files
  • Review scratchDirForBlob guard (. / .. / "" bucket+object rejection)
  • Review flushOrphanObject guard application
  • Review updated TestS3_DotSegmentObjectKeyRejected
  • Review new TestS3_HandleBlobRejectsScratchEscape (5 sub-cases)
  • Check for remaining issues from prior rounds
  • Post final review

The Codex P1 round-11 finding is correctly addressed. Detailed verdict follows.

Fix: scratchDirForBlob

The guard (s3.go:455–466) rejects ., .., and "" for both the bucket and object arguments before any filepath.Join or EncodeSegment call:

for _, seg := range [...]string{bucket, object} {
    switch seg {
    case ".", "..":
        return "", errors.Wrapf(ErrS3MalformedKey, ...)
    case "":
        return "", errors.Wrapf(ErrS3MalformedKey, ...)
    }
}

The design choice is correct: EncodeSegment percent-encodes / (so multi-segment keys like a/../b collapse to a single filename a%2F..%2Fb and are safe at the scratch level), but . is RFC3986-unreserved and survives unchanged, so bucket=".." would produce filepath.Join(scratchRoot, "..", …) and escape the scratch tree before safeJoinUnderRoot ever ran. Catching it here closes the window. Multi-segment dot keys (a/../b) still reach Finalize and are caught there by safeJoinUnderRoot's segment scan. ✓

The guard fires at the start of HandleBlob, before objectState mutation or any MkdirAll — so no side-effects accumulate for a rejected blob. ✓

Fix: flushOrphanObject guard ✅

s3.go:717–720 applies the same literal check to obj.object before constructing the orphan directory:

if obj.object == "." || obj.object == ".." || obj.object == "" {
    return errors.Wrapf(ErrS3MalformedKey,
        "orphan object key %q is a dot segment (would escape orphan dir)", obj.object)
}

The guard is in the right place: it only fires when s.includeOrphans == true (earlier return at line 711 when false) and before any MkdirAll. The _orphans/ directory uses EncodeSegment([]byte(obj.object)) so multi-segment names are already safe. ✓

Minor — flushOrphanObject doesn't guard the bucket name. bucketDir is passed in pre-computed via flushBucketfilepath.Join(s.outRoot, "s3", EncodeSegment([]byte(b.name))). Since EncodeSegment preserves ., a bucket named .. would produce outRoot/s3/.. = outRoot/. The orphan dir would land at outRoot/_orphans/… instead of outRoot/s3/../_orphans/…. This is contained within outRoot (no host escape), so it's not a security issue, but it would pollute the output root rather than the intended s3/ subdirectory. The HandleBlob guard catches bucket=".." before any blob is registered under that bucket, so in practice the orphan path only runs for objects that already have chunk paths — meaning HandleBlob was successfully called at least once with a valid bucket name. If bucket="..", HandleBlob rejects before objectState is mutated. Low severity, defense-in-depth concern only.

Test: TestS3_HandleBlobRejectsScratchEscape

The 5 sub-cases cover:

case bucket object scratchDirForBlob check
object_dotdot "b" ".." object == ".." → reject
object_dot "b" "." object == "." → reject
bucket_dotdot ".." "x" bucket == ".." → reject
bucket_dot "." "x" bucket == "." → reject
both_dotdot ".." ".." bucket == ".." → reject (loop exits on first match)

All assert errors.Is(err, ErrS3MalformedKey) directly on the HandleBlob return — no Finalize required. ✓

Test: Updated TestS3_DotSegmentObjectKeyRejected

The updated test (s3_test.go:621) correctly handles the split-rejection design: for ".." and "." object keys, HandleBlob fires the guard and the test returns early; for "a/../b" and "a/./b", HandleBlob succeeds (object is multi-segment, safely encoded by EncodeSegment in scratch path) and safeJoinUnderRoot catches them at Finalize. The added t.Parallel() is a minor improvement.

The logic paths for each key:

key HandleBlob result rejection point
".." ErrS3MalformedKey scratchDirForBlob (round 11)
"." ErrS3MalformedKey scratchDirForBlob (round 11)
"a/../b" success (object is multi-segment) safeJoinUnderRoot at Finalize
"a/./b" success safeJoinUnderRoot at Finalize

Stale comment (carry-over from rounds 5 and 6 — still present)

safeJoinUnderRoot at s3.go:799–802:

// Use filepath.Join here — its only behavioural change vs. raw
// concatenation after the dot-segment guard above is normalising
// a leading "/" off `rel` (which is what we want: absolute-path
// keys collapse safely under bucketDir).

Since round-5 leading-slash keys are rejected (empty leading segment). The phrase "absolute-path keys collapse safely" contradicts the current design. Still low priority, still not blocking.


Carry-overs from prior rounds

Item Status
Stale comment at safeJoinUnderRoot re: "absolute-path keys collapse safely" Still present (low priority)
joined == cleanRoot dead branch Still present (low priority)
Silent bypass when manifest.upload_id == "" By design (documented)
Scratch dir recomputed per-blob after scratchDirCreated gate Negligible
Orphan chunks loaded via os.ReadFile in flushOrphanObject Low priority
No test for --include-orphans + stale-gen routing Low priority

The Codex P1 round-11 finding (scratch-path escape via literal ./.. bucket or object names) is correctly fixed at the right enforcement boundary and tested. No new blocking issues. The PR is ready to merge once the stacked bases (feat/backup-phase0a-dynamodb, feat/s3keys-parse-blob-key) land.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8881bf3fa7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/s3.go Outdated
Comment on lines +94 to +95
if want.chunkCount > 0 && (o.count != want.chunkCount || o.maxIndex+1 != want.chunkCount) {
return errors.Wrapf(ErrS3IncompleteBlobChunks,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate per-part chunk indexes, not just count and max

Fresh evidence: verifyChunkCompleteness can still accept missing manifest chunks because it only checks o.count and o.maxIndex. For a declared chunk_count=3, observed chunk numbers {0,2,2} satisfy count==3 and maxIndex+1==3, so this branch does not raise ErrS3IncompleteBlobChunks even though chunk 1 is absent; finalize then assembles corrupted object bytes silently.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Codex P1 round 12 (commit e91f086):

`verifyChunkCompleteness` only checked `count` and `maxIndex` per
(partNo, partVersion). For declared `chunk_count=3`, observed
chunk numbers `{0, 2, 2}` satisfy `count == 3` and
`maxIndex+1 == 3` while chunk_no=1 is absent — assembleObjectBody
would then emit a corrupted body silently.

Replace the dual-threshold heuristic with a set-membership
check: track every observed chunk index per (partNo,
partVersion) and verify the set is exactly
`{0, 1, …, chunk_count-1}`. Both the unique-count guard and a
per-index lookup surface ErrS3IncompleteBlobChunks before the
assembler runs.

Test:
  - TestS3_DuplicateChunksWithMissingIndexRejected: drives the
    `{0, 2, 2}` shape, asserts the new validator rejects it.
  - TestS3_IncompleteChunksRejected (existing) still passes
    against the new code path.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 10 fix for Codex P1 round 12 on commit e91f086 (commit 4505df35):

verifyChunkCompleteness only checked count and maxIndex per (partNo, partVersion). For declared chunk_count=3, observed chunk numbers {0, 2, 2} satisfied both thresholds (count==3, maxIndex+1==3) while chunk_no=1 was absent — assembleObjectBody would then emit a corrupted body silently.

Replaced the dual-threshold heuristic with a set-membership check: track every observed chunk index per (partNo, partVersion) and verify the set is exactly {0, 1, …, chunk_count-1}. Both the unique-count guard and a per-index lookup surface ErrS3IncompleteBlobChunks before the assembler runs.

Test: TestS3_DuplicateChunksWithMissingIndexRejected drives the {0, 2, 2} shape; existing TestS3_IncompleteChunksRejected still passes against the new code path. go test -race and golangci-lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 4m 10s —— View job


Review — 4505df35 round 10 (set-based chunk completeness)

  • Gather context and read changed files
  • Review verifyChunkCompleteness set-based replacement
  • Review TestS3_DuplicateChunksWithMissingIndexRejected
  • Verify TestS3_IncompleteChunksRejected still covers the new code path
  • Check for remaining issues from prior rounds

The Codex P1 round-12 fix is correct and the implementation is clean. One subtle observation about the reachability of the scenario worth understanding.

Fix: set-based verifyChunkCompleteness

The replacement tracks a map[s3PartKey]map[uint64]struct{} of observed chunk indices per (partNo, partVersion), then checks both:

  1. len(seen) == want.chunkCount — unique-count guard
  2. seen[i] exists for every i in [0, chunkCount) — per-index guard

These two checks are complementary. The unique-count check alone fails when {0, 1, 5} is observed for chunkCount=3 (count=3 matches, but index 2 is absent and index 5 is out of range). The per-index loop closes that gap.

Subtle observation: the {0, 2, 2} scenario is unreachable at verifyChunkCompleteness ℹ️

The Codex finding describes: observed {0, 2, 2} produces count=3, maxIndex=2, both pass the old guard while chunkNo=1 is absent. In the actual production call chain:

HandleBlob → chunkPaths[s3ChunkKey{...}] = path   // map: deduplicates by key
↓
filterChunksForManifest(obj.chunkPaths, ...)        // iterates the map → no duplicates
↓
verifyChunkCompleteness(chunks []s3ChunkKey, ...)   // chunks has no duplicate s3ChunkKeys

Two calls to HandleBlob with the same (uploadID, partNo, chunkNo, partVersion) — what the test does — write to the same chunkPaths map entry. The second overwrites the first. By the time filterChunksForManifest is called, only 2 unique keys exist: chunkNo=0 and chunkNo=2. The old code's count check (2 != 3) would have caught this just as well.

For non-duplicate, no-gap inputs, the old dual-threshold was also correct: n unique non-negative integers with maximum n-1 must be exactly {0, 1, …, n-1}. The only scenario the old code admitted incorrectly required duplicate chunkNo values in the chunks slice — which the map prevents.

This does not make the fix wrong. The set-based approach is strictly better:

  • verifyChunkCompleteness is now correct as a pure function regardless of how chunks is constructed
  • It future-proofs against any refactoring that might pass a slice without deduplication guarantees
  • The semantics are clearer (explicit set membership vs. indirect heuristic)

The test TestS3_DuplicateChunksWithMissingIndexRejected is still a valid regression test — it just exercises the unique-count path (guard 1) rather than the per-index path (guard 2), because the duplicate is eliminated by chunkPaths before the validator runs. The test title overstates what's happening at the verifyChunkCompleteness level, but the behavior under test (incomplete chunk set → error) is correctly verified.

Test: TestS3_IncompleteChunksRejected against new code path ✅

The existing test submits chunkNo=0 and chunkNo=2 (no duplicate). chunkPaths has 2 entries. len(seen)=2 != chunkCount=3 → fires on the unique-count check. Correct. ✓

//nolint:gosec // bounded

uint64(len(seen))len() returns a non-negative int; the cast is safe on all platforms (int ≤ 64 bits). The comment is accurate. ✓

Minor: per-iteration inner-map allocation

verifyChunkCompleteness allocates a make(map[uint64]struct{}) for every unique (partNo, partVersion) seen in chunks, whereas the old code stored two counter fields on a value type. For objects with many parts, this adds allocation pressure. This is bounded by O(num_parts × chunks_per_part) and only runs at Finalize, not on the hot HandleBlob path — negligible in practice.


Remaining carry-overs from prior rounds

Item Status
Stale comment at safeJoinUnderRoot re: "absolute-path keys collapse safely" Still present (low priority)
joined == cleanRoot dead branch Still present (low priority)
Silent bypass when manifest.upload_id == "" By design (documented)
Scratch dir recomputed per-blob after scratchDirCreated gate Negligible
Orphan chunks loaded via os.ReadFile in flushOrphanObject Low priority
No test for --include-orphans + stale-gen routing Low priority

The set-based replacement is correct and future-proof. No new blocking issues. The PR is ready to merge once the stacked bases (feat/backup-phase0a-dynamodb, feat/s3keys-parse-blob-key) land.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant