Fix outdated code references in message passing documentation#8584
Fix outdated code references in message passing documentation#8584Saurabh16-s wants to merge 2 commits intoapache:masterfrom
Conversation
|
Thx for the PR. The current version is 2.8.7 - any reason for 2.6.7? |
|
My bad — I saw v2.6.2 in the issue, so I used that version. I’ll update it to v2.8.7. |
|
Updated the documentation to reflect Storm v2.8.7. Please let me know if any further changes are needed. |
rzo1
left a comment
There was a problem hiding this comment.
Thanks for the update! A few suggestions inline. Main themes: (1) the Disruptor sentence is outdated (2.x uses JCQueue), (2) several links use /blob/ for directories which 404 on GitHub, (3) some link pairs collapse to the same URL — the original doc had distinct line anchors.
| --- | ||
| (Note: this walkthrough is out of date as of 0.8.0. 0.8.0 revamped the message passing infrastructure to be based on the Disruptor) | ||
|
|
||
| (Note: this walkthrough has been updated for v2.8.7. As of 0.8.0, the message passing infrastructure has been based on the Disruptor) |
There was a problem hiding this comment.
The Disruptor reference is outdated — 2.x replaced it with JCQueue (JCTools), see storm-client/src/jvm/org/apache/storm/utils/JCQueue.java.
| (Note: this walkthrough has been updated for v2.8.7. As of 0.8.0, the message passing infrastructure has been based on the Disruptor) | |
| (Note: this walkthrough has been updated for v2.8.7. The message passing infrastructure was rewritten in 0.8.0 (originally Disruptor-based) and later moved to JCQueue.) |
| - The implementation for local mode uses in memory Java queues (so that it's easy to use Storm locally without needing to get ZeroMQ installed) [code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/messaging/local.clj) | ||
| - Connection management is handled by the `WorkerState` class which manages connections to other workers and maintains a mapping from task -> worker. Connection refresh is triggered every "task.refresh.poll.secs" or whenever assignment in ZK changes. [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java) | ||
| - Provides a "transfer function" that is used by tasks to send tuples to other tasks. The transfer function takes in a task id and a tuple, and it serializes the tuple and puts it onto a "transfer queue". There is a single transfer queue for each worker. [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerTransfer.java) | ||
| - The serializer is thread-safe [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java) |
There was a problem hiding this comment.
The original 0.7.1 source had a comment backing this claim; the modern KryoTupleSerializer holds a Kryo field, and Kryo itself is not thread-safe (instances are scoped per executor/thread). Could you verify, or drop "thread-safe"?
| - The serializer is thread-safe [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java) | ||
| - The worker drains the transfer queue and sends the messages to other workers [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java) | ||
| - Message sending happens through this interface: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/IConnection.java) | ||
| - The implementation for distributed mode uses Netty [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/) |
There was a problem hiding this comment.
GitHub /blob/ URLs don't resolve to directories — this should be /tree/, or better, point at the entry-point file:
| - The implementation for distributed mode uses Netty [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/) | |
| - The implementation for distributed mode uses Netty [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/Context.java) |
| - The worker drains the transfer queue and sends the messages to other workers [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java) | ||
| - Message sending happens through this interface: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/IConnection.java) | ||
| - The implementation for distributed mode uses Netty [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/) | ||
| - The implementation for local mode uses in memory Java queues (so that it's easy to use Storm locally without needing external messaging dependencies) [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/) |
There was a problem hiding this comment.
Same /blob/ issue, and the directory only contains one file:
| - The implementation for local mode uses in memory Java queues (so that it's easy to use Storm locally without needing external messaging dependencies) [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/) | |
| - The implementation for local mode uses in memory Java queues (so that it's easy to use Storm locally without needing external messaging dependencies) [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/Context.java) |
| - Tasks listen on an in-memory ZeroMQ port for messages from the virtual port [code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L201) | ||
| - Bolts listen here: [code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L489) | ||
| - Spouts listen here: [code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L382) | ||
| - In local mode, the tuple is sent directly to an in-memory queue for the receiving task [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/) |
There was a problem hiding this comment.
Same /blob/ directory issue:
| - In local mode, the tuple is sent directly to an in-memory queue for the receiving task [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/) | |
| - In local mode, the tuple is sent directly to an in-memory queue for the receiving task [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/Context.java) |
| - Spouts listen here: [code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L382) | ||
| - In local mode, the tuple is sent directly to an in-memory queue for the receiving task [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/) | ||
| - In distributed mode, each worker listens on a single TCP port for incoming messages and then routes those messages in-memory to tasks. The TCP port receives [task id, message] and then routes it to the actual task. [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerHandler.java) | ||
| - The message routing implementation is here: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/) |
There was a problem hiding this comment.
Same /blob/ directory issue; original linked a specific file. Suggest pointing at Server.java (or removing this bullet — line 21 already covers Netty routing):
| - The message routing implementation is here: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/) | |
| - The message routing implementation is here: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/Server.java) |
| - In local mode, the tuple is sent directly to an in-memory queue for the receiving task [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/) | ||
| - In distributed mode, each worker listens on a single TCP port for incoming messages and then routes those messages in-memory to tasks. The TCP port receives [task id, message] and then routes it to the actual task. [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerHandler.java) | ||
| - The message routing implementation is here: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/) | ||
| - Executors listen on an in-memory connection for messages [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/executor/Executor.java) |
There was a problem hiding this comment.
In 2.x, bolt/spout executors consume from their JCQueue receive queue rather than listening on an in-memory connection:
| - Executors listen on an in-memory connection for messages [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/executor/Executor.java) | |
| - Executors consume from their receive queue (a `JCQueue`) [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/executor/Executor.java) |
| - Tasks have a routing map from {stream id} -> {component id} -> {stream grouping function} [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/Task.java) | ||
| - Stream grouping functions determine the task ids to send the tuples to for either regular stream emit or direct stream emit [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/Task.java) |
There was a problem hiding this comment.
Both bullets link to the same Task.java root. The original 0.7.1 doc had distinct line anchors. Either add #L<line> anchors at the relevant fields/methods, or merge into one bullet:
| - Tasks have a routing map from {stream id} -> {component id} -> {stream grouping function} [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/Task.java) | |
| - Stream grouping functions determine the task ids to send the tuples to for either regular stream emit or direct stream emit [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/Task.java) | |
| - Tasks have a routing map from {stream id} -> {component id} -> {stream grouping function}; the grouping functions determine the task ids to send the tuples to for either regular stream emit or direct stream emit. [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/Task.java) |
| - Bolt transfer code here: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltExecutor.java) | ||
| - Spout transfer code here: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutExecutor.java) |
There was a problem hiding this comment.
The "Bolts listen here" (line 24) and "Bolt transfer code here" (line 30) bullets both link to BoltExecutor.java; same for SpoutExecutor.java on lines 25/31. Please add #L<line> anchors so each link lands on a distinct method, or merge the receive/transfer bullets — the right anchors are your call.
Description
Updated Message Passing Implementation documentation with current code links (v2.8.7 instead of v0.7.1).
Changes
Related Issues
Closes #7845