Skip to content

RATIS-2514. Fix flaky TestReadOnlyRequestWithGrpc.testReadAfterWrite.#1446

Open
slfan1989 wants to merge 1 commit intoapache:RATIS-1931_grpc-zero-copyfrom
slfan1989:RATIS-2514
Open

RATIS-2514. Fix flaky TestReadOnlyRequestWithGrpc.testReadAfterWrite.#1446
slfan1989 wants to merge 1 commit intoapache:RATIS-1931_grpc-zero-copyfrom
slfan1989:RATIS-2514

Conversation

@slfan1989
Copy link
Copy Markdown
Contributor

@slfan1989 slfan1989 commented May 5, 2026

What changes were proposed in this pull request?

This pull request fixes a flaky assertion in TestReadOnlyRequestWithGrpc.testReadAfterWrite.

The previous test sent multiple async writes without waiting for them to complete, then issued a linearizable read and a read-after-write request concurrently. It asserted that the read-after-write result should be greater than or equal to the concurrent linearizable read result.

That assumption is unreliable because the two reads may have different linearization points and may observe different subsets of concurrent writes.

This patch keeps the original async write workload and both read paths, but makes the test deterministic by:

  • waiting for the async writes to complete before issuing the reads;
  • verifying that both linearizable read and read-after-write observe the completed prior writes;
  • waiting for followers to catch up before cluster shutdown to avoid shutdown-time leak detection races.

What is the link to the Apache JIRA

RATIS-2514. Fix flaky TestReadOnlyRequestWithGrpc.testReadAfterWrite.

How was this patch tested?

Tested with:

./mvnw -pl ratis-test -am -Pgrpc-tests -Dtest=TestReadOnlyRequestWithGrpc#testReadAfterWrite test

Result:

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.ratis.grpc.TestReadOnlyRequestWithGrpc
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.746 s - in org.apache.ratis.grpc.TestReadOnlyRequestWithGrpc
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------

Copy link
Copy Markdown
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989 , this problem probably has already be fixed in the master branch. Since the difference between two branches are huge. I suggest doing a merge/rebase first.

client.async().sendReadAfterWrite(queryMessage).thenAccept(reply -> {
Assertions.assertEquals(2, retrieve(reply));
});
client.async().send(incrementMessage).get();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sendReadAfterWrite feature is for two async calls, (1) writeAsync and then (2) readAfterWriteAsync such that (2) must be able see the change by (1). So, we cannot call get() here. Otherwise, the test becomes useless. For exampe,

  1. initial: X = 1
  2. writeAsync: set X = 3
  3. readAfterWriteAsync: must see X = 3 (it is a bug if it can see X = 1)

This may be a bug already fixed (and moved to LinearizableReadTests) in the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants