Skip to content

RATIS-2512. Fix unreleased zero-copy messages during gRPC service shutdown.#1445

Open
slfan1989 wants to merge 1 commit intoapache:RATIS-1931_grpc-zero-copyfrom
slfan1989:RATIS-2512
Open

RATIS-2512. Fix unreleased zero-copy messages during gRPC service shutdown.#1445
slfan1989 wants to merge 1 commit intoapache:RATIS-1931_grpc-zero-copyfrom
slfan1989:RATIS-2512

Conversation

@slfan1989
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This pull request fixes a zero-copy request leak during gRPC service shutdown.

ZeroCopyMessageMarshaller may still hold pending or in-flight zero-copy request streams when the gRPC service is being closed. Previously, closing these tracked streams did not trigger the release logic for the corresponding zero-copy messages, which could cause ZeroCopyMetrics to report unreleased messages during cluster shutdown.

The proposed changes:

  • Release tracked zero-copy messages when ZeroCopyMessageMarshaller.close() is called.
  • Add close logic for gRPC protocol services that own zero-copy request marshallers.
  • Explicitly close these protocol services in GrpcServicesImpl.closeImpl() before unregistering zero-copy metrics.
  • Ensure the same GrpcServerProtocolService instance bound to the gRPC server is also closed during shutdown.

What is the link to the Apache JIRA

RATIS-2512. Fix unreleased zero-copy messages during gRPC service shutdown

How was this patch tested?

  • This patch was tested with the following commands:
./mvnw -pl ratis-test -am -Pgrpc-tests -Dtest=TestGrpcZeroCopy,TestRaftAsyncWithGrpc#testWithLoadAsync test
  • Both tests passed successfully.
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.ratis.grpc.TestRaftAsyncWithGrpc
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.141 s - in org.apache.ratis.grpc.TestRaftAsyncWithGrpc
[INFO] Running org.apache.ratis.grpc.util.TestGrpcZeroCopy
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 21.272 s - in org.apache.ratis.grpc.util.TestGrpcZeroCopy
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------

Copy link
Copy Markdown
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

}

void close() {
zeroCopyRequestMarshaller.close();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good catch! We never call zeroCopyRequestMarshaller.close() except for tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants