Skip to content

Fix outdated code references in message passing documentation#8584

Open
Saurabh16-s wants to merge 2 commits intoapache:masterfrom
Saurabh16-s:fix-message-passing-doc
Open

Fix outdated code references in message passing documentation#8584
Saurabh16-s wants to merge 2 commits intoapache:masterfrom
Saurabh16-s:fix-message-passing-doc

Conversation

@Saurabh16-s
Copy link
Copy Markdown

@Saurabh16-s Saurabh16-s commented May 4, 2026

Description

Updated Message Passing Implementation documentation with current code links (v2.8.7 instead of v0.7.1).

Changes

  • Updated all GitHub links from 0.7.1 → v2.8.7
  • Replaced Clojure references with current Java file paths
  • Changed ZeroMQ references to Netty
  • Kept original documentation structure and wording

Related Issues

Closes #7845

@rzo1
Copy link
Copy Markdown
Contributor

rzo1 commented May 4, 2026

Thx for the PR. The current version is 2.8.7 - any reason for 2.6.7?

@Saurabh16-s
Copy link
Copy Markdown
Author

My bad — I saw v2.6.2 in the issue, so I used that version. I’ll update it to v2.8.7.

@Saurabh16-s
Copy link
Copy Markdown
Author

Updated the documentation to reflect Storm v2.8.7. Please let me know if any further changes are needed.

Copy link
Copy Markdown
Contributor

@rzo1 rzo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! A few suggestions inline. Main themes: (1) the Disruptor sentence is outdated (2.x uses JCQueue), (2) several links use /blob/ for directories which 404 on GitHub, (3) some link pairs collapse to the same URL — the original doc had distinct line anchors.

---
(Note: this walkthrough is out of date as of 0.8.0. 0.8.0 revamped the message passing infrastructure to be based on the Disruptor)

(Note: this walkthrough has been updated for v2.8.7. As of 0.8.0, the message passing infrastructure has been based on the Disruptor)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Disruptor reference is outdated — 2.x replaced it with JCQueue (JCTools), see storm-client/src/jvm/org/apache/storm/utils/JCQueue.java.

Suggested change
(Note: this walkthrough has been updated for v2.8.7. As of 0.8.0, the message passing infrastructure has been based on the Disruptor)
(Note: this walkthrough has been updated for v2.8.7. The message passing infrastructure was rewritten in 0.8.0 (originally Disruptor-based) and later moved to JCQueue.)

- The implementation for local mode uses in memory Java queues (so that it's easy to use Storm locally without needing to get ZeroMQ installed) [code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/messaging/local.clj)
- Connection management is handled by the `WorkerState` class which manages connections to other workers and maintains a mapping from task -> worker. Connection refresh is triggered every "task.refresh.poll.secs" or whenever assignment in ZK changes. [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java)
- Provides a "transfer function" that is used by tasks to send tuples to other tasks. The transfer function takes in a task id and a tuple, and it serializes the tuple and puts it onto a "transfer queue". There is a single transfer queue for each worker. [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerTransfer.java)
- The serializer is thread-safe [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original 0.7.1 source had a comment backing this claim; the modern KryoTupleSerializer holds a Kryo field, and Kryo itself is not thread-safe (instances are scoped per executor/thread). Could you verify, or drop "thread-safe"?

- The serializer is thread-safe [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java)
- The worker drains the transfer queue and sends the messages to other workers [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java)
- Message sending happens through this interface: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/IConnection.java)
- The implementation for distributed mode uses Netty [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub /blob/ URLs don't resolve to directories — this should be /tree/, or better, point at the entry-point file:

Suggested change
- The implementation for distributed mode uses Netty [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/)
- The implementation for distributed mode uses Netty [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/Context.java)

- The worker drains the transfer queue and sends the messages to other workers [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java)
- Message sending happens through this interface: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/IConnection.java)
- The implementation for distributed mode uses Netty [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/)
- The implementation for local mode uses in memory Java queues (so that it's easy to use Storm locally without needing external messaging dependencies) [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same /blob/ issue, and the directory only contains one file:

Suggested change
- The implementation for local mode uses in memory Java queues (so that it's easy to use Storm locally without needing external messaging dependencies) [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/)
- The implementation for local mode uses in memory Java queues (so that it's easy to use Storm locally without needing external messaging dependencies) [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/Context.java)

- Tasks listen on an in-memory ZeroMQ port for messages from the virtual port [code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L201)
- Bolts listen here: [code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L489)
- Spouts listen here: [code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L382)
- In local mode, the tuple is sent directly to an in-memory queue for the receiving task [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same /blob/ directory issue:

Suggested change
- In local mode, the tuple is sent directly to an in-memory queue for the receiving task [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/)
- In local mode, the tuple is sent directly to an in-memory queue for the receiving task [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/Context.java)

- Spouts listen here: [code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L382)
- In local mode, the tuple is sent directly to an in-memory queue for the receiving task [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/)
- In distributed mode, each worker listens on a single TCP port for incoming messages and then routes those messages in-memory to tasks. The TCP port receives [task id, message] and then routes it to the actual task. [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerHandler.java)
- The message routing implementation is here: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same /blob/ directory issue; original linked a specific file. Suggest pointing at Server.java (or removing this bullet — line 21 already covers Netty routing):

Suggested change
- The message routing implementation is here: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/)
- The message routing implementation is here: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/Server.java)

- In local mode, the tuple is sent directly to an in-memory queue for the receiving task [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/local/)
- In distributed mode, each worker listens on a single TCP port for incoming messages and then routes those messages in-memory to tasks. The TCP port receives [task id, message] and then routes it to the actual task. [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerHandler.java)
- The message routing implementation is here: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/messaging/netty/)
- Executors listen on an in-memory connection for messages [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/executor/Executor.java)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 2.x, bolt/spout executors consume from their JCQueue receive queue rather than listening on an in-memory connection:

Suggested change
- Executors listen on an in-memory connection for messages [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/executor/Executor.java)
- Executors consume from their receive queue (a `JCQueue`) [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/executor/Executor.java)

Comment on lines +27 to +28
- Tasks have a routing map from {stream id} -> {component id} -> {stream grouping function} [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/Task.java)
- Stream grouping functions determine the task ids to send the tuples to for either regular stream emit or direct stream emit [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/Task.java)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both bullets link to the same Task.java root. The original 0.7.1 doc had distinct line anchors. Either add #L<line> anchors at the relevant fields/methods, or merge into one bullet:

Suggested change
- Tasks have a routing map from {stream id} -> {component id} -> {stream grouping function} [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/Task.java)
- Stream grouping functions determine the task ids to send the tuples to for either regular stream emit or direct stream emit [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/Task.java)
- Tasks have a routing map from {stream id} -> {component id} -> {stream grouping function}; the grouping functions determine the task ids to send the tuples to for either regular stream emit or direct stream emit. [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/daemon/Task.java)

Comment on lines +30 to +31
- Bolt transfer code here: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltExecutor.java)
- Spout transfer code here: [code](https://github.com/apache/storm/blob/v2.8.7/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutExecutor.java)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Bolts listen here" (line 24) and "Bolt transfer code here" (line 30) bullets both link to BoltExecutor.java; same for SpoutExecutor.java on lines 25/31. Please add #L<line> anchors so each link lands on a distinct method, or merge the receive/transfer bullets — the right anchors are your call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[STORM-4064] Old code reference in the document

2 participants