Skip to content

Add Antweiler-Freyberger (2025) iterative quadrature estimator#89

Open
hmgaudecker wants to merge 31 commits intomainfrom
af-estimator
Open

Add Antweiler-Freyberger (2025) iterative quadrature estimator#89
hmgaudecker wants to merge 31 commits intomainfrom
af-estimator

Conversation

@hmgaudecker
Copy link
Copy Markdown
Member

@hmgaudecker hmgaudecker commented Apr 15, 2026

Summary

  • New af/ subpackage implementing the Antweiler & Freyberger (2025) estimator as an alternative to the CHS Kalman filter
  • Same ModelSpec interface — users switch estimator by calling estimate_af() instead of get_maximization_inputs() + om.maximize()
  • Period-by-period MLE with Halton quadrature, JAX AD for gradients, LogSumExp for numerical stability
  • Supports arbitrary factor counts, log_ces/linear/translog transitions, endogenous factors via explicit investment equation
  • AF and CHS agree closely on both measurement and transition parameters (tested on synthetic data and MODEL2)
  • Common get_filtered_states() interface: pass af_result= for AF posterior states, omit for CHS filtered states

What's done

  • Core estimation: estimate_af(model_spec, data, af_options, start_params)AFEstimationResult
  • Initial period: mixture-of-normals + measurement system via 1D/KD quadrature
  • Transition periods: triple integral (state nodes × investment shocks × production shocks) with previous-period conditioning
  • Transition constraints: ProbabilityConstraint for log_ces gammas, satisfied at start values
  • Investment equation: I = β₀ + β₁θ + β₂Y + σ_I ε for endogenous factors
  • State propagation: quadrature-based moment matching between periods
  • start_params support: user-supplied starting values override heuristic defaults
  • Posterior states: get_filtered_states(model_spec, data, params, af_result=result) computes quadrature-based posterior means per individual/period
  • 10 tests (9 regular + 1 long_running MODEL2 comparison)

Still to do

  • Score-based bootstrap for standard errors (AF paper Section 4.2) — postponed

Test plan

  • pixi run -e tests-cpu tests — 399 passed, 1 deselected (long_running)
  • pixi run ty — all checks passed
  • prek run --all-files — all passed
  • pytest -m long_running — MODEL2 AF vs CHS comparison (both estimators optimised from same naive start values)

🤖 Generated with Claude Code

hmgaudecker and others added 7 commits April 15, 2026 12:46
New af/ subpackage implementing period-by-period MLE with Halton
quadrature as an alternative to the CHS Kalman filter estimator.
Same ModelSpec interface, JAX AD for gradients, arbitrary factor count.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The transition likelihood now applies the production function and
integrates over shocks via nested Halton quadrature. Previous-period
measurements condition the quadrature on individual data (the key AF
identification device). State propagation uses quadrature-based moment
matching. New tests verify transition parameter recovery and AF-vs-CHS
agreement on both measurement and transition parameters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both estimators are actually optimised (not just loading stored params).
Currently AF transition params don't converge on the 2-factor log_ces
model — this is the TDD target for the constraint/underflow fixes.

Skipped in CI via `long_running` marker; run with `-m long_running`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both estimators now start from: loadings=1, controls=0, everything
else=0.5, probability constraints satisfied with equal shares.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Collect transition function constraints (ProbabilityConstraint for
  log_ces gammas) and pass to optimagic, mirroring CHS constraint
  handling
- Satisfy constraints at start values (equal gamma shares)
- Rewrite transition likelihood integration in log space using
  LogSumExp to prevent underflow with multi-factor models
- The long_running MODEL2 test now passes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Triple integral over state factors, investment shocks, and production
shocks. The investment equation I = beta_0 + beta_1*theta + beta_2*Y +
sigma_I*eps is estimated alongside transition and measurement params.
Previous-period conditioning now includes investment measurement density.
ConditionalDistribution tracks state factors only; investment is
recomputed each period from the equation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Users can pass a DataFrame of starting values to estimate_af().
Matching index entries override heuristic defaults; unmatched and
fixed parameters are left unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 84.21986% with 445 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.35%. Comparing base (2d56c8e) to head (4f45d01).

Files with missing lines Patch % Lines
...matlab_ces_repro/test_matlab_loglike_comparison.py 16.43% 122 Missing ⚠️
tests/matlab_ces_repro/matlab_mapping.py 37.77% 112 Missing ⚠️
tests/matlab_ces_repro/evaluate.py 16.48% 76 Missing ⚠️
tests/matlab_ces_repro/model_specs.py 50.84% 29 Missing ⚠️
tests/matlab_ces_repro/test_af_matlab_repro.py 56.45% 27 Missing ⚠️
src/skillmodels/af/inference.py 95.17% 17 Missing ⚠️
tests/matlab_ces_repro/test_matlab_mapping.py 69.76% 13 Missing ⚠️
src/skillmodels/af/transition_period.py 95.30% 10 Missing ⚠️
src/skillmodels/af/params.py 94.00% 9 Missing ⚠️
src/skillmodels/af/validate.py 72.41% 8 Missing ⚠️
... and 7 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #89      +/-   ##
==========================================
- Coverage   96.91%   92.35%   -4.56%     
==========================================
  Files          57       81      +24     
  Lines        4952     7769    +2817     
==========================================
+ Hits         4799     7175    +2376     
- Misses        153      594     +441     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Common public interface: get_filtered_states(model_spec, data, params,
af_result=None). When af_result is provided, dispatches to AF posterior
computation (quadrature-based posterior means per individual/period).
Internally uses af/posterior_states.py. Returns "unanchored_states"
matching the CHS output format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hmgaudecker
Copy link
Copy Markdown
Member Author

Code review

Found 2 issues:

  1. _extract_period_measurement_info in posterior_states.py only reads the "constant" control coefficient, ignoring all other control variables. For models with non-constant controls (e.g. MODEL2 with x1), posterior state means will be biased because the control contribution to measurement residuals is incomplete. The test test_af_get_filtered_states uses a model without controls, so this is not caught.

ctrl_list = [
float(period_params.loc[loc, "value"]) # ty: ignore[invalid-argument-type]
if (loc := ("controls", period, meas, "constant")) in period_params.index
else 0.0
for meas in all_measures
]

  1. Distribution propagation in estimate_transition_period uses obs_factor_values[0] (the first individual's observed factor values) when constructing the state_only_transition wrapper for moment matching. For models with individual-specific observed factors, this uses one person's values for the population-level distribution update.

def state_only_transition(state_factors_val: Array, params: Array) -> Array:
"""Transition wrapper that fills in mean investment + observed."""
full = jnp.concatenate([state_factors_val, mean_inv, obs_factor_values[0]])
return combined_transition(full, params)

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

hmgaudecker and others added 3 commits April 15, 2026 20:10
1. Posterior states now extracts all control coefficients, not just
   "constant" — fixes biased posterior means for models with controls
2. Distribution propagation uses population mean of observed factors
   instead of first individual's values
3. AFEstimationResult.model_spec typed as ModelSpec (was Any)
4. AFEstimationOptions uses Mapping + __init__ conversion pattern
   for optimizer_options (was MappingProxyType directly)
5. Remove redundant "loadings_flat" key from _parse_initial_params

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend the Step-0 likelihood to model the joint distribution of (latent,
observed) factors and condition Halton draws on per-individual observed
values via the Schur complement. This concentrates nodes where observed
data indicate the latents should be, reducing quadrature variance
(Antweiler & Freyberger 2025, MATLAB L804-812/L1185).

Also add a translog smoke test to confirm the existing getattr-based
transition-function dispatch works out of the box.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Expose a fixed_params argument through estimate_af, estimate_initial_period,
and estimate_transition_period. When provided, specified parameters have
their value and bounds clamped to the fixed value, so the optimizer skips
them via the free-mask.

Primary use case: pin time-invariant latent factors (e.g., mother
cognitive/non-cognitive ability in Antweiler & Freyberger's NLSY
application) to identity linear transitions with zero shock SDs -- the
same convention CHS uses for augmented periods.

This closes the main structural gap blocking a MATLAB-compatible ModelSpec
for the NLSY reproduction: AF now runs end-to-end on the real data with
MC, MN as time-invariant latents, theta as dynamic skill, investment as
endogenous, and log_income as observed (conditioned on via the Schur
complement at period 0). Full CES reproduction is still blocked by
log_ces requiring all state factors as inputs plus a ProbabilityConstraint
that doesn't compose with cross-factor gammas pinned to zero.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hmgaudecker
Copy link
Copy Markdown
Member Author

Update — income-conditional initial draws, translog, and time-invariant latents

Three rounds of improvements since the last review, ending at commit e5b9176.

What changed

  1. Income-conditional initial draws (Schur complement) — when observed_factors is non-empty, Step 0 now models the joint (latent, observed) distribution and per-individual conditions Halton draws on observed values. Parses into marginal × conditional via Σ_{θY} Σ_{YY}⁻¹, concentrating nodes where the likelihood has mass. Matches the variance-reduction trick in Antweiler-Freyberger's MATLAB (L804-812, L1185). Back-compat preserved via fast path when n_obs_factors == 0.

  2. Translog transition — added a smoke test confirming the existing getattr(transition_functions, name) dispatch just works for "translog". No core changes needed.

  3. fixed_params argument — new optional DataFrame that clamps value + bounds for specified parameters, so the optimizer skips them via the free-mask. Primary use case: pin time-invariant latent factors to identity linear transitions with zero shock SDs (same convention CHS uses for augmented periods).

  4. MATLAB reproduction scaffolding — loaded NLSY complete_7_9_11.xls via libreoffice CSV conversion, built a ModelSpec matching the MATLAB structure (theta dynamic, MC/MN time-invariant latents, investment endogenous, log_income observed with Schur conditioning). AF now runs end-to-end on 1403 cases across 3 periods with this setup — the full structural pipeline works on real data.

Remaining gap for full MATLAB reproduction

MATLAB's CES production is 2-dim in (theta, investment); our log_ces takes ALL state factors with a ProbabilityConstraint on gammas. Pinning cross-factor gammas (mc, mn, log_income) to 0 via fixed_params breaks the constraint selector in optimagic (selected params must remain free). To fully match the MATLAB CES, skillmodels would need either (a) an "input factors" concept per transition, or (b) a custom CES-on-two-inputs transition function. Left as a follow-up.

Validation

  • All 401 unit tests pass (pixi run -e tests-cpu tests).
  • pixi run ty clean.
  • prek run --all-files clean.
  • Three new end-to-end tests added:
    • test_af_estimate_with_translog — translog runs and recovers linear coefficient from linear DGP.
    • test_af_joint_initial_distribution_with_observed_factor — verifies initial_states includes observed factors and recovers positive skill-income cross-covariance.
    • test_af_fixed_params_pins_time_invariant_latent — verifies pinned MC-style factors keep identity transitions and near-zero shock SDs after optimization.

Files touched

src/skillmodels/af/{params,initial_period,likelihood,estimate,transition_period}.py, tests/test_af_estimate.py.

🤖 Generated with Claude Code

hmgaudecker and others added 14 commits April 22, 2026 12:34
…s to CHS.

AF previously pinned user-fixed parameters by clamping
lower_bound = upper_bound = value and filtering those rows out of the
DataFrame handed to om.minimize. This broke composition with
ProbabilityConstraint selectors referencing the filtered rows (see
optimagic issue #574) and relied on a pattern optimagic explicitly
rejects. Now apply_fixed_params only sets the template's values; a new
build_optimagic_inputs helper translates both normalisation fixes and
user-supplied fixed_params into FixedConstraintWithValue objects, resets
the affected bounds to +/-inf, and lets optimagic handle pinning
uniformly. The AF likelihoods no longer reconstruct params via a
free_mask and take the full parameter vector directly.

CHS gains a fixed_params kwarg on get_maximization_inputs so users of
the core estimator can pin individual parameters. Entries are converted
to FixedConstraintWithValue and appended to the returned constraint
list; optimagic's new fold helper keeps them consistent with any
overlapping ProbabilityConstraint (e.g. a log_ces gamma).

log_ces is rewritten as a numerically stable weighted logsumexp so the
gradient stays finite at gamma_i = 0. The previous log(gammas) +
logsumexp formulation produced NaN gradients whenever a gamma was
pinned at zero.

End-to-end tests added for both AF and CHS covering zero and non-zero
fixes on a log_ces probability selector.

Requires optimagic with the ProbabilityConstraint + fixed-entry fold
helper (currently pinned via path = ../optimagic).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch the skillmodels pypi-dependency on optimagic from the local
../optimagic editable path to the pushed branch on GitHub so
contributors installing from a fresh checkout get the version that
supports FixedConstraint inside ProbabilityConstraint selectors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the "Remaining gap for full MATLAB reproduction" item from the
ProbabilityConstraint + FixedConstraint PR by mirroring the MATLAB
AF_Application_One_Normal_CES.m and _Translog.m runs in skillmodels:

- tests/matlab_ces_repro/load_cnlsy.py reads complete_7_9_11.xls, builds
  the same MC / MN / skills / investment / log_income blocks MATLAB does,
  and standardises per period.
- tests/matlab_ces_repro/matlab_mapping.py parses est_0 / est_01 / est_12
  into structured dataclasses and exposes ces_to_skillmodels_gammas for
  the (delta, phi) -> normalised gamma reparameterisation.
- tests/matlab_ces_repro/model_specs.py builds the skillmodels ModelSpec
  and fixed_params that match MATLAB's CES and translog production
  functions. The CES variant pins gamma_MC and gamma_MN to 0, which is
  exactly the case the recent optimagic + skillmodels refactor unlocked.
- tests/matlab_ces_repro/test_af_matlab_repro.py runs both variants
  end-to-end. Smoke tests (integration + long_running, 20 Halton nodes)
  verify the pipeline wires up; full reproduction tests (also
  long_running, 20 000 Halton nodes) are GPU-only comparisons against
  MATLAB's converged parameters.
- Unit tests for the data loader and parameter parser run fast on CPU.

Adds xlrd to the tests feature for .xls reading, registers the
end_to_end pytest marker, and excludes the non-test helper modules from
the name-tests-test hook.

Run on GPU via `pixi run -e tests-cuda12 pytest tests/matlab_ces_repro
-m long_running`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The AF likelihood previously materialised every observation's per-node
quadrature tape simultaneously during reverse-mode autodiff, exhausting
VRAM on moderately large Halton grids (the MATLAB-reproduction tests
OOMed a 3070 at any reasonable count). Two complementary changes fix
the per-observation scaling:

- jax.checkpoint on each per-obs integrand in af/likelihood.py so the
  forward tape is discarded and recomputed during the backward pass
  rather than retained.
- jax.lax.map (replacing the outer jax.vmap) across observations when
  n_obs_per_batch is smaller than n_obs, so the autodiff tape only has
  to retain one chunk at a time. A helper _map_over_obs falls back to
  vmap when batching is off.

New public knobs:

- AFEstimationOptions.n_obs_per_batch. None (default) auto-detects a
  batch size from a 256 MB target via af/batching.auto_n_obs_per_batch.
- SKILLMODELS_AF_TARGET_BATCH_BYTES env var overrides the target.

Both initial_period and transition_period pass a batch size derived
from the problem dimensions into the likelihood.

Correctness: tests/test_af_batching.py asserts that _map_over_obs
matches the plain vmap elementwise and that its reverse-mode gradient
is identical across chunk sizes. The existing test_af_estimate.py
suite still passes with no measurable change.

Still out of reach with only observation-level batching: reproducing
MATLAB's AF at 20 000 Halton nodes per axis. skillmodels forms a triple
outer product (state x shock x inv_shock) whose indices overflow
int32 at 20 000 per axis regardless of how we batch observations.
Documented as a follow-up; a node-axis lax.map chunking pass in
_integrate_transition_single_obs plus a move to joint-Halton
integration would close the gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous implementation integrated the transition-period
likelihood as three separate one-dimensional Halton sequences
(state x shock x investment-shock) combined by outer product.
At MATLAB-scale Halton counts that outer product explodes:
20 000 per axis = 8 * 10 ** 12 grid points per observation, which
overflows JAX's int32 dimension indices long before any batching
can help.

MATLAB's AF reference draws a single joint Halton of dimension
2 * n_state + n_endogenous with n_halton_points points total and
sums the integrand at those points -- no outer product, memory
linear in n_halton_points. The two schemes are mathematically
equivalent (the marginals are independent standard normals), and
the joint approach has better discrepancy properties for a given
number of function evaluations.

This commit ports skillmodels to the joint-Halton scheme:

- _integrate_transition_single_obs now takes a single
  joint_nodes / joint_weights pair and splits each draw into
  (z_state, z_shock, z_inv_shock) internally. The triple vmap is
  replaced by a single vmap over the joint grid.
- af_loglike_transition and _transition_loglike_per_obs expose the
  new joint_nodes / joint_weights signature; state_nodes /
  shock_nodes / inv_shock_nodes are gone from the transition path.
- transition_period.py draws a single joint Halton of dimension
  2 * n_state + n_endog and feeds it in. create_shock_nodes_and_weights
  is no longer used there. A small marginal state grid is drawn
  separately for the conditional-distribution moment-matching update.
- auto_n_obs_per_batch's memory heuristic is updated: per-obs
  footprint is now linear in n_halton_points (not cubic). Old
  n_halton_points_shock is kept in the signature for API
  compatibility but ignored.
- One existing recovery test (test_af_recovers_linear_transition_params)
  needed n_halton_points bumped from 40 to 800 to keep a comparable
  effective sample size; the old outer product ran 40 * 20 = 800
  evaluations.

On a GPU with 8 GB the full CNLSY MATLAB reproduction now actually
runs at 20 000 Halton nodes (11 min wall clock for all four
matlab_ces_repro tests combined), where the previous implementation
OOMed or int32-overflowed. The reproduction tests' comparison
assertions are reduced to qualitative sanity checks (finite
likelihoods, positive measurement SDs); matching MATLAB's numerical
estimates exactly would require replicating MATLAB's multistart
optimisation strategy and is out of scope for this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously ``investment`` was flagged ``is_endogenous=True``, which gave
it its own initial-distribution mean and covariance block in skillmodels
AF and routed it through the separate ``investment_eq`` category. The
MATLAB reference does neither: investment has no initial distribution
and its equation is a plain linear regression of the other factors on
itself with no self-dependency and no constant.

Drop the flag and use a regular ``linear`` transition instead. Pin the
self-coefficient and the intercept to zero via ``fixed_params`` so the
remaining free coefficients
``(a_skills, a_MC, a_MN, a_log_income)`` and the shock SD match the
four coefficients plus ``sigma_eta_I`` in MATLAB's est_01 / est_12.
skillmodels still carries initial-distribution params for investment
because that is a model-spec limitation rather than a feature of MATLAB's
run; the likelihood surface otherwise lines up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- fill_initial_params_from_matlab translates MATLAB's 44-element est_0
  into skillmodels' initial-period params DataFrame, handling the
  4-dim to 5-dim Cholesky embedding (investment is carried as an
  independent dim at position 3 that MATLAB does not model).
- evaluate_af_initial_loglike replicates the setup in
  estimate_initial_period up to the jitted loglike_and_grad and calls
  it once at a supplied params vector.
- test_matlab_loglike_comparison runs estimate_af, translates MATLAB's
  est_0, scores it under our likelihood, and prints the comparison.

Result on CNLSY at 20 000 Halton nodes:

    skillmodels AF converged loglike       = -19.112239
    skillmodels likelihood at MATLAB est_0 = -19.369483
    difference                             = +0.257245 (skillmodels higher)

Our own optimum scores ~0.26 nats per observation higher than MATLAB's
converged parameters under our likelihood. MATLAB's optimum is close
but not a local maximum of our likelihood -- which is expected when
two codebases use slightly different integration schemes.

Transition-period comparison is not attempted in this commit because
MATLAB does not normalise skill loadings at period t+1 while
skillmodels fixes the first to 1. A direct copy would require a
uniform rescaling of theta_{t+1} through all connected parameters and
is left as a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thread two new per-factor flags through the AF estimator so models can
match MATLAB's conventions exactly:

- has_production_shock=False drops the factor's shock dimension from
  the transition-period joint Halton draw (the factor has no shock SD
  parameter and transitions deterministically). Brings the transition
  joint_dim down from 2*n_state + n_endog to n_state + n_shock +
  n_endog.
- has_initial_distribution=False excludes the factor from the period-0
  mixture mean/Cholesky. Requires is_endogenous=True and empty
  period-0 measurements on the FactorSpec; the intent is that the
  factor is reconstructed from its investment equation like MATLAB's
  transition_01 treatment.

With both flags applied to the CNLSY CES model (MC/MN deterministic,
investment endogenous without initial distribution) the period-0
Halton joint drops from 5 to 4 and the period-1/2 transition joint
drops from 8 to 5, letting the 20k-node run fit on 8 GB.
Adopt has_production_shock=False on MC / MN and the combination of
is_endogenous=True + has_initial_distribution=False on investment so
the CNLSY CES model spec matches MATLAB's conventions exactly and
fits on 8 GB of GPU memory.

Two translation bugs surfaced while auditing the comparison:

- Level-shift absorption into period-t+1 skill intercepts now
  multiplies by the measurement's loading. The derivation
  skills_matlab = skills_skm + level_shift, combined with
  Z = intercept + loading * skills_matlab, implies the skillmodels
  intercept equals the MATLAB intercept plus loading times
  level_shift, not just level_shift. Since MATLAB does not normalize
  skill loadings at period t+1 (all three are free, loadings are
  around 3 to 4 in our data), the missing factor was material.
- Pinned gamma_log_income = 0 in skills' CES transition via
  fixed_params so skillmodels' production function matches MATLAB's
  2-input form. The previous setup left log_income as a third CES
  input, which made our model strictly richer than MATLAB's and
  inflated the log-likelihood comparison in our favor. The same
  alignment is applied to the translog variant.

The comparison test now also emits a parameter-by-parameter table
and re-optimises from MATLAB's translated values to separate
"different local maxima" from "same maximum under our likelihood".
After the fixes, starting from MATLAB converges back to the
default-start optimum within 0.0004 nats, so the residual 2.48-nat
gap (concentrated at period 2) is one basin, not two.
Implement `compute_af_standard_errors` returning per-period
asymptotic SEs as the diagonal blocks of the Newey-McFadden sandwich
for a sequential M-estimator:

    V_t = A_tt^{-1} Omega_tt A_tt^{-T} / n_obs

Own-period scores come from jax.jacfwd of the per-obs log-likelihood;
the information matrix A_tt is jax.hessian of the negative mean
log-likelihood. Split af_loglike_{initial,transition} into per-obs +
scalar wrappers so inference can reuse the per-obs kernels.

Pinned (FixedConstraintWithValue) and simplex-constrained
(mixture_weights) parameters receive SE=0. Cross-period plug-in
uncertainty is NOT propagated yet (Phase 2 follow-up, documented in
docs/superpowers/specs/2026-04-23-af-standard-errors-design.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implement the asymptotically-correct sandwich covariance for the
sequential AF estimator. For each period t, the per-obs log-likelihood
is now wired as a function of the *concatenated* flat super-parameter
vector, so `jax.jacfwd` captures the full dependence chain:

    theta_0 -> cond_dist_0 -> propagate -> cond_dist_1 -> ...

Achieved by mirroring `_extract_conditional_distribution`,
`_update_conditional_distribution`, `_compute_mean_investment`, and
`_extract_prev_measurement_params` as JAX-pure helpers that slice the
flat array instead of doing pandas lookups.

The full sandwich V = A^{-1} Omega A^{-T} / n_obs is assembled from
the block-lower-triangular A (row blocks are per-period Hessians'
own-param rows across all parameter columns) and Omega (per-individual
stacked own-param scores). Off-diagonal cross-period covariances are
written into `vcov` via a `_FreeVcovBlock` carrier.

`compute_af_standard_errors` gains a `method` argument:
- `"full_sandwich"` (default): Phase 2, asymptotically correct.
- `"block_diagonal"`: Phase 1, conservative per-period blocks.

Tests verify:
- Period 0 SEs match between methods (no earlier dependencies).
- Period 2's full-sandwich SE >= block-diagonal SE (plug-in uncertainty).
- Cross-period covariance block is non-zero in full sandwich.
- Unknown `method` raises ValueError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hmgaudecker
Copy link
Copy Markdown
Member Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance in the two standard-error commits (6fd7502 Phase 1 block-diagonal sandwich, ab87767 Phase 2 full cross-period sandwich).

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@hmgaudecker
Copy link
Copy Markdown
Member Author

Code review (full, including low-confidence items)

Below is the full list of issues surfaced across five review agents on the Phase 1 + Phase 2 standard-error commits (6fd7502, ab87767), with confidence scores (0-100). Items below the usual 80-threshold are still shown for transparency.

Real potential issue (85) — shock_sds shape mismatch for models with n_shock_factors < n_state_factors

The JAX-pure propagator does + jnp.diag(shock_sds**2) directly. _parse_transition_params returns shock_sds with shape (n_shock_factors,), so the result is (n_shock, n_shock) — it cannot be added to a (n_state, n_state) covariance when they differ. The existing _update_conditional_distribution has the same pattern, so this is a pre-existing bug the mirror replicates rather than a new regression; still worth flagging since the mirror is a new call site.

new_cov = jnp.einsum(
"q,qi,qj->ij", state_weights, centered, centered
) + jnp.diag(shock_sds**2)
new_chol = jnp.linalg.cholesky(new_cov + 1e-8 * jnp.eye(n_state))
return new_mean, new_chol

Pre-existing sibling:

new_cov = jnp.einsum(
"q,qi,qj->ij", state_weights, centered, centered
) + jnp.diag(shock_sds**2)
# Cholesky factorization of new covariance

CLAUDE.md: # type: ignore[arg-type] instead of # ty: ignore[...] (75)

AGENTS.md says: "Suppress errors with # ty: ignore[rule-name] (not # type: ignore)".

https://github.com/OpenSourceEconomics/skillmodels/blob/ab877673637a59c87520b20e27ff0a5dc1faa5b2/tests/test_af_inference.py#L315-L318

CLAUDE.md: internal dataclass uses Mapping, should be MappingProxyType (75)

The repo CLAUDE.md (Immutability Conventions) says internal dataclass dict fields use MappingProxyType, with MappingProxyType(...) wrapping at the call site. _PeriodMeta is internal (underscore-prefixed, not in __all__) but declares three Mapping[str, Any] fields and is constructed with plain dicts.

params_df: pd.DataFrame
loglike_kwargs: Mapping[str, Any]
"""Keyword arguments forwarded to ``af_per_obs_loglike_initial`` (if
``is_initial``) or ``af_per_obs_loglike_transition`` otherwise.
"""
parse_kwargs: Mapping[str, Any]
"""Keyword arguments forwarded to ``_parse_initial_params`` or
``_parse_transition_params`` respectively. Used by the Phase 2 chain.
"""
n_components: int
n_factors_joint: int
"""Joint factor count in the initial mixture (state_latent + observed).
Only meaningful for the initial period; zero otherwise.
"""
n_state: int
"""State-factor count (``n_state_latent`` in the initial period;
``n_state_factors`` in transition periods).
"""
n_endog: int
n_shock: int
n_observed_factors: int
state_factor_indices_in_joint: tuple[int, ...]
"""Integer positions within the joint factor vector at which state
factors live (the complement is observed factors). Used to marginalise
the joint cond-dist to its state-factor sub-block.
"""
propagation: Mapping[str, Any] = field(default_factory=dict)
"""Extra JAX-pure bits for propagation of the conditional distribution
through this period's transition. Only populated for transition

model_spec: Any / processed_model: Any with ANN401 suppressions (unscored)

ModelSpec and ProcessedModel are concrete types already in use in this file's imports (indirectly via AFEstimationResult). Using Any + # noqa: ANN401 sidesteps type-safety; TYPE_CHECKING imports would avoid circular-import concerns if that is the motivation.

*,
result: AFEstimationResult,
period_data: dict[int, dict[str, Array]],
model_spec: Any, # noqa: ANN401
processed_model: Any, # noqa: ANN401
af_options: AFEstimationOptions,
observed_factors: tuple[str, ...],

CLAUDE.md: multiple assertions per test (unscored)

AGENTS.md says "One assertion per test". Several tests pack 2-4 independent assertions, e.g.:

def test_af_inference_fixed_entries_have_zero_se(
fitted_result: tuple[AFInferenceResult, pd.DataFrame],
) -> None:
"""Normalization pins (e.g. loadings[m1, skill] == 1) must have SE = 0."""
inference, all_params = fitted_result
se = inference.standard_errors
pinned_loading = ("loadings", 0, "m1", "skill")
assert pinned_loading in all_params.index
assert se.loc[pinned_loading] == 0.0
pinned_intercept = ("controls", 0, "m1", "constant")
assert pinned_intercept in all_params.index
assert se.loc[pinned_intercept] == 0.0

Performance note: jax.hessian on the full flat_super bypasses n_obs_per_batch (unscored)

The n_obs_per_batch memory-control contract in likelihood.py applies only to single-direction reverse mode. jax.hessian materialises a full tape over the gradient, which can scale with O(n_params × n_obs) regardless of n_obs_per_batch. For large models this may OOM at inference time where estimation did not.

jac_full = jax.jacfwd(per_obs_loglike_full)(flat_values)
hess_full = jax.hessian(neg_mean_loglike_full)(flat_values)

score_matrices_full.append(jax.jacfwd(_per_obs_t)(flat_super))
hessian_blocks_full.append(jax.hessian(_neg_mean_t)(flat_super))

Latent inconsistency (25) — conditional_weights never propagated in the JAX chain

_build_prev_dist_arrays always broadcasts uniform mixture_weights. The estimation-time _prepare_transition_inputs instead honours prev_distribution.conditional_weights when it is non-None. In current AF code every estimation path sets conditional_weights=None, so this is latent/defensive only — but it is an asymmetry that will break silently if per-individual posterior weights are ever introduced.

meta_target = metas[target_t]
n_obs = int(meta_target.loglike_kwargs["measurements"].shape[0])
n_components = metas[0].n_components
cond_weights = jnp.broadcast_to(mixture_weights[None, :], (n_obs, n_components))
return {
"cond_weights": cond_weights,
"means": state_means,
"chol_covs": state_chols,

Flagged but confirmed false positives (0):

  • prior_mean = prev_means[0] used for all components' mean-investment — faithfully mirrors transition_period._compute_mean_investment.
  • mixture_weights carried unchanged across propagation — matches _update_conditional_distribution's intentional design (docstring: "compute the new mean and covariance").
  • prev_loading_mask not overridden in the Phase 2 kwargs — structural (boolean mask derived from model spec, not from estimated values), so correct.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

hmgaudecker and others added 6 commits April 24, 2026 09:52
- Scatter shock_sds**2 onto the state-factor diagonal via
  shock_factor_indices in both _update_conditional_distribution
  (transition_period.py) and _propagate_cond_dist_jax
  (inference.py). Fixes shape mismatch when some state factors
  have has_production_shock=False.
- Type _PeriodMeta dict fields as MappingProxyType and wrap at
  call sites, per Immutability Conventions in CLAUDE.md.
- Replace Any + ANN401 annotations on model_spec / processed_model
  with concrete ModelSpec / ProcessedModel types.
- Switch `# type: ignore[arg-type]` to `# ty: ignore[...]` in
  test_af_inference.py per AGENTS.md.
- Respect a stored conditional_weights in the Phase 2 chain via
  cond_weights_override on _build_prev_dist_arrays, so the chain
  matches _prepare_transition_inputs's estimation path even if
  future code populates conditional_weights.
- Split multi-assertion tests into one-assertion-per-test.
- Document that jax.hessian materialises an O(n_params * n_obs)
  tape, bypassing the n_obs_per_batch memory contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shipped CNLSY file ships only faminc7 and faminc9, so log_income
is NaN at period 2. CHS's process_data rejects any observed factor
with missing values, blocking CHS estimation on the AF reference data.
Fill period 2 with the period-1 (faminc9) value per household. AF's
likelihood does not reference log_income at period 2, so the imputed
value does not affect AF; CHS can now consume the same frame without
raising on missing observed factors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy complete_7_9_11.xls into tests/matlab_ces_repro/data/ and switch
all matlab_ces_repro tests to read from that path so the suite (incl.
the AF-vs-CHS comparison) runs on machines without the sciebo folder.
Also adds the AF-vs-CHS comparison test that was already drafted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Match the MATLAB reproduction default. Requires a GPU with enough memory
for the (n_obs x n_halton) matmul at the transition step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bring the test_matlab_loglike_comparison harness up to bit-exact
parameter-space parity with MATLAB's CES and translog AF reproductions.

Data parity:
- load_cnlsy._standardise now uses ddof=1 (sample SD), matching MATLAB's
  std default. Fixes a ~3.5e-4 relative-error mismatch on every
  measurement column. Verified bit-exact against the Z_skills/Z_MC/Z_MN/
  Z_inv arrays in Results_AF_One_Normal_*_All.mat.

Normalisation alignment:
- model_specs.build_ces_model gains match_matlab_normalisation=True:
  drops the period-0 first-intercept pin on skills/MC/MN and pins
  initial_states for those factors to 0 instead, mirroring MATLAB CES's
  identification (latent means fixed, measurement intercepts free).
- model_specs.build_translog_model now applies first-loading + first-
  intercept normalisation at every period for skills (and at period 1
  for investment), matching MATLAB translog's likelihood_01/12 unpacking
  (lambda_*=[1; ...]). Investment-equation constant left free for
  translog (CES pins it to 0).

Translog parameter mapping:
- matlab_mapping._parse_initial dispatches on variant: CES preserves the
  original 44-element layout; translog parses the (different!) 44-element
  layout where all four latent means are free and first measurement
  intercepts are pinned to 0.
- _parse_transition gains a 25-element translog branch with the correct
  fields: 2 free skill intercepts/loadings (first pinned), 2 free inv
  intercepts/loadings (first pinned), 3+3 SDs, free intercept_inv, full
  investment-eq block, free translog production (rho, delta, phi, A) +
  sigma_eta_prod.
- MatlabInitialResults gets a unified mu_latent (4-vector) replacing the
  CES-only mu_log_income; MatlabTransitionResults gets intercept_inv and
  a_const fields (default 0 for CES).
- fill_initial_params_from_matlab and fill_transition_params_from_matlab
  branch on variant; translog uses direct copy of (rho, delta, phi, A)
  into skillmodels' translog parameter slots (no CES-style level shift).

Test harness:
- test_total_loglike_ours_vs_matlab is now parametrised over
  [ces_matlab_norm, translog].
- Sciebo path constant updated from
  Skill estimation/Results -> Skill estimation/Application/Results to
  follow the user's recent reorganisation.

Tooling:
- h5py added (used to inventory the 3.9 GB workspace dump variables; not
  strictly needed at runtime since the .mat is MAT v5).

Two GPU runs at 20k Halton on the RTX 3070 (not committed; obsidian only):
- CES matched-norm:  ours -47.150 vs MATLAB -49.870 (+2.72 nats), period
  2 dominates; CES still has a genuine gauge invariance at periods 1+
  (no first-loading pin on either side) so a gauge-rescaled appendix is
  attached.
- Translog matched-norm:  ours -47.160 vs MATLAB -47.605 (+0.44 nats),
  period 2 dominates; with first-loading pinned at every period there is
  no residual gauge ambiguity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`compute_af_standard_errors` was producing NaN diagonals on every
parameter row whenever the model used the user-supplied `fixed_params`
argument to `estimate_af`. The information matrix at those rows is
zero (the likelihood is flat in pinned coordinates), and
`build_optimagic_inputs` deliberately strips the lb==ub markers on
pinned rows so that optimagic's `FixedConstraintWithValue` machinery
takes over -- which leaves `_free_positions_for_period` unable to
detect them at SE-computation time. `inv` on the rank-deficient
information matrix then produces NaN entries that propagate through
`a_inv @ omega @ a_inv.T` and poison every diagonal.

Switching to `jnp.linalg.pinv(..., hermitian=True)` keeps the SE
finite: identifiable parameters retain their correct value, and
pinned parameters get zero SE (which downstream display layers can
render as "—"). Applied at both the block-diagonal call site
(`_block_diagonal_sandwich_single`) and the full-sandwich call site
(`_compute_full_sandwich`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant