Skip to content

feat(blockchain): advance XMSS preparation window in the background#332

Open
conache wants to merge 4 commits intolambdaclass:mainfrom
conache:xmss-advance-background
Open

feat(blockchain): advance XMSS preparation window in the background#332
conache wants to merge 4 commits intolambdaclass:mainfrom
conache:xmss-advance-background

Conversation

@conache
Copy link
Copy Markdown
Contributor

@conache conache commented Apr 30, 2026

🗒️ Description / Motivation

This PR closes #262.

Every 65,536 slots, an XMSS signing key has to precompute its next bottom tree via leansig's advance_preparation. The two most recently computed trees form a sliding "prepared window" of 131,072 slots - the range the key can sign for without doing more work. Once the wall-clock slot crosses out of that window, the precomputation has to run before the next signature.

PR #261 made advance_preparation() run synchronously on the BlockChainServer actor's tick handler. When the window has to slide forward, the actor blocks on the hash work long enough to stall other executions.

This PR moves that work to a background worker. The actor returns immediately; the worker messages the advanced key back when done, and a handler restores it.

What Changed

crates/common/types/src/signature.rs — new ValidatorSecretKey::advance_until_prepared(self, target_slot) -> Option<Self>. Repeatedly calls advance_preparation until the prepared window covers target_slot; returns None when the lifetime is exhausted.

crates/blockchain/src/key_manager.rsValidatorKeyPair fields are now Option<ValidatorSecretKey> so a worker can take ownership for the duration of the advance. The signing methods (sign_attestation / sign_block_root) no longer loop on advance_preparation; they return either KeyNotPreparedForSlot { validator_id, role, slot } (caller should trigger a background advance) or KeyUnavailable(validator_id) (key is in flight or exhausted — skip silently). New KeyRole enum identifies which of the two keys per validator. keys is now pub(crate) to match the existing keys.get_mut(...) shape inside the signing methods.

crates/blockchain/src/lib.rs — new BlockChainServer::prepare_key_for_slot(...) takes the key out, spawns the worker, and the worker sends KeyPreparedForSlot { validator_id, role, key: Option<...> } back. New Handler<KeyPreparedForSlot> restores the field on success or logs at error! level on exhaustion. produce_attestations and propose_block match the new error arms.

bin/ethlambda/src/main.rsValidatorKeyPair { ... } construction wraps each loaded key in Some(...).

Correctness / Behavior Guarantees

  • The actor never blocks in advance_preparation. The work runs on the blocking thread pool.
  • At a boundary slot, the workers for all loaded keys run in parallel rather than sequentially on the actor.
  • An affected validator misses one attestation while its key is checked out (the slot during which its worker is running). Other validators on the same node, and other slots for the same validator, are unaffected.
  • Exhaustion is permanent: the field stays None, subsequent sign attempts return KeyUnavailable, and the handler emits one error! log line. Operators must rotate keys.
  • Steady-state (in-window) signing path is unchanged.

Tests Added / Run

  • make fmt clean
  • make lint clean
  • make test passes
  • Boundary-crossing verification on devnet — only fires every 65,536 slots, deferred to operational verification. Log lines Preparing XMSS key for slot in background (start) and XMSS key advance complete (success) signal the path firing.

Related Issues / PRs

@conache conache marked this pull request as ready for review April 30, 2026 18:32
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

This PR moves the blocking XMSS advance_preparation call off the BlockChainServer actor's tick handler into a spawn_blocking worker. The key is taken out of Option<ValidatorSecretKey>, advanced on a background thread, and sent back via a new KeyPreparedForSlot message, preventing the actor from stalling during the ~65,536-slot window advance.

  • The new test_advance_until_prepared_advances_then_detects_exhaustion test calls generate_key_with_three_bottom_trees() — documented as "slow (~minutes)" — without the #[ignore] attribute that the only other test using the same helper already carries. This will add several minutes to every make test run.

Confidence Score: 4/5

Safe to merge after addressing the missing #[ignore] on the slow test

A single P1 finding (missing test attribute that will block CI for minutes); the background-worker design itself is correct and idempotent

crates/common/types/src/signature.rs — the new test needs #[ignore]

Important Files Changed

Filename Overview
crates/common/types/src/signature.rs Adds advance_until_prepared method; new test is missing #[ignore] despite calling the slow key-generation helper (~minutes)
crates/blockchain/src/key_manager.rs Fields changed to Option<ValidatorSecretKey>, new KeyRole enum and KeyNotPreparedForSlot/KeyUnavailable error variants added; logic is clean
crates/blockchain/src/lib.rs Adds prepare_key_for_slot helper and KeyPreparedForSlot message handler; background worker pattern is correct and idempotent
bin/ethlambda/src/main.rs Trivial change: wraps loaded keys in Some(...) to match new Option<ValidatorSecretKey> field type

Sequence Diagram

sequenceDiagram
    participant Tick as on_tick actor
    participant KM as KeyManager
    participant Worker as spawn_blocking
    participant Handler as KeyPreparedForSlot handler

    Tick->>KM: sign_attestation or sign_block_root
    KM-->>Tick: Err(KeyNotPreparedForSlot)
    Tick->>Tick: prepare_key_for_slot - field.take removes key
    Tick->>Worker: advance_until_prepared(target_slot)
    Note over Tick: returns immediately, key field is None

    Worker-->>Handler: KeyPreparedForSlot message

    alt advance succeeded
        Handler->>KM: restore key field with advanced key
    else key exhausted
        Handler->>Handler: emit error log and field stays None
    end
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
crates/common/types/src/signature.rs:202-203
**New test missing `#[ignore]` despite using slow key generation**

`generate_key_with_three_bottom_trees()` is explicitly documented as "slow (~minutes) because it computes 3 bottom trees of 65,536 leaves each", and the only other test that calls it (`test_advance_preparation_duration`) carries `#[ignore = "slow: generates production-size XMSS key (~minutes)"]` for exactly this reason. The new test omits that attribute, so `make test` will now block for several minutes on every CI run.

```suggestion
    #[test]
    #[ignore = "slow: generates production-size XMSS key (~minutes)"]
    fn test_advance_until_prepared_advances_then_detects_exhaustion() {
```

Reviews (1): Last reviewed commit: "Add formatting fixes" | Re-trigger Greptile

Comment on lines +202 to +203
fn test_advance_until_prepared_advances_then_detects_exhaustion() {
let key = generate_key_with_three_bottom_trees();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 New test missing #[ignore] despite using slow key generation

generate_key_with_three_bottom_trees() is explicitly documented as "slow (~minutes) because it computes 3 bottom trees of 65,536 leaves each", and the only other test that calls it (test_advance_preparation_duration) carries #[ignore = "slow: generates production-size XMSS key (~minutes)"] for exactly this reason. The new test omits that attribute, so make test will now block for several minutes on every CI run.

Suggested change
fn test_advance_until_prepared_advances_then_detects_exhaustion() {
let key = generate_key_with_three_bottom_trees();
#[test]
#[ignore = "slow: generates production-size XMSS key (~minutes)"]
fn test_advance_until_prepared_advances_then_detects_exhaustion() {
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/common/types/src/signature.rs
Line: 202-203

Comment:
**New test missing `#[ignore]` despite using slow key generation**

`generate_key_with_three_bottom_trees()` is explicitly documented as "slow (~minutes) because it computes 3 bottom trees of 65,536 leaves each", and the only other test that calls it (`test_advance_preparation_duration`) carries `#[ignore = "slow: generates production-size XMSS key (~minutes)"]` for exactly this reason. The new test omits that attribute, so `make test` will now block for several minutes on every CI run.

```suggestion
    #[test]
    #[ignore = "slow: generates production-size XMSS key (~minutes)"]
    fn test_advance_until_prepared_advances_then_detects_exhaustion() {
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test ignored in 49a232b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Advance XMSS preparation window in the background

1 participant