Skip to content

PARQUET-3261: Fix integer overflow in CapacityByteArrayOutputStream#3525

Open
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/3261-capacity-overflow
Open

PARQUET-3261: Fix integer overflow in CapacityByteArrayOutputStream#3525
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/3261-capacity-overflow

Conversation

@yadavay-amzn
Copy link
Copy Markdown

What changes were made?

Fix integer overflow in CapacityByteArrayOutputStream.addSlab() that causes ArithmeticException when writing large ARRAY<STRING> columns (issue #3261).

Root cause (identified by @Kimahriman)

The overflow check in addSlab() used bytesUsed to detect overflow, but bytesUsed is not updated until after addSlab() returns in write(). Additionally, nextSlabSize can be larger than minimumSize (due to the doubling strategy), so checking only bytesUsed + minimumSize was insufficient.

This meant bytesAllocated = Math.addExact(this.bytesAllocated, nextSlabSize) could overflow without being caught by the guard, throwing an uncaught ArithmeticException instead of the intended OutOfMemoryError.

Fix

  1. Use bytesAllocated instead of bytesUsed for the overflow check — bytesAllocated is always up to date when addSlab() is called.
  2. Cap nextSlabSize when it would cause bytesAllocated to overflow Integer.MAX_VALUE, preventing the uncaught ArithmeticException on the Math.addExact call.

Tests

Added TestCapacityByteArrayOutputStreamOverflow with two tests:

  • Verifies that slab allocation near Integer.MAX_VALUE succeeds (previously threw ArithmeticException)
  • Verifies that a true overflow still throws OutOfMemoryError as intended

The overflow check in addSlab() used bytesUsed which is not updated until
after addSlab() returns in write(). This caused the overflow guard to miss
cases where bytesAllocated + nextSlabSize exceeds Integer.MAX_VALUE.

Fix:
- Use bytesAllocated instead of bytesUsed for the overflow check, since
  bytesAllocated is always up to date when addSlab() is called.
- Cap nextSlabSize when it would cause bytesAllocated to overflow, instead
  of letting Math.addExact throw an uncaught ArithmeticException.
@yadavay-amzn yadavay-amzn force-pushed the fix/3261-capacity-overflow branch from 2dab1ed to 29bbc79 Compare April 22, 2026 01:42
Copy link
Copy Markdown
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants