Closed-Loop Ablation Architecture
The ablation harness should mirror the ENABOL online training loop, but remain small enough to inspect every tensor and compute exact curvature diagnostics. The first implementation target is software simulation, not HLS synthesis.
Training Loop
Each experiment follows the same high-level flow:
1. Generate a controlled dataset.
2. Train a small floating-point or high-precision reference model.
3. Quantize or simulate fixed-point training with selected precisions.
4. Apply input drift.
5. Continue online training with one controller variant enabled.
6. Log loss, norms, curvature proxies, throttle, update geometry, saturation, and rails.
The online loop should operate on a flattened global parameter vector:
theta = flatten(W1, b1, W2, b2, ...)
G = flatten(dL/dW1, dL/db1, dL/dW2, dL/db2, ...)
This makes global controllers easy to implement and lets us measure whether a method preserves the intended update direction.
Priority Controllers
Implement these first:
| Switch | Meaning |
|---|---|
controller=none | Baseline online training. |
controller=dynamic_global_throttle | Compute one scalar alpha(t) and scale the full update vector. |
controller=global_static_kappa_scale | If global gain exceeds K_max, scale all layers by one shared scalar. |
controller=loose_kappa_plus_throttle | Keep loose static rails and apply dynamic global throttle. |
precision | Fixed-point format or simulated fixed-point rails. |
Legacy row/column kappa projection can be included later as controller=legacy_row_col_projection if it is already available or cheap to stub. It is not a first implementation requirement.
Dynamic Global Throttle
At each online step:
- input current parameters , gradient , learning rate
- input previous parameters , previous gradient
- curvature proxy
- return , ,
Because is global, it preserves the raw update direction:
unless fixed-point saturation, projection, or another mechanism distorts the update.
Experiment 001: Single Dense Affine Regression
This is the minimum test case. It isolates closed-loop update stability without inter-layer interactions.
Math:
Backpass:
Drift:
x_drift = alpha x + beta
Primary question:
Can dynamic global throttling keep online fixed-point training stable in a known linear system where the exact solution and Hessian are easy to inspect?
Experiment 002: Two Dense Layers With ReLU
This introduces an intermediate activation and an inter-layer gradient path while still staying small enough to inspect.
Teacher model:
Student model:
Backpass:
Primary question:
When there is an intermediate activation, can global throttling stabilize coupled layer dynamics without changing descent geometry?
Comparison Variants
The first matrix should be small and should not require rebuilding the old row/column machinery:
| Variant | Required Now | Purpose |
|---|---|---|
| Floating reference | yes | Establish expected behavior without fixed-point limits. |
| Fixed-point baseline | yes | Find regimes where online learning fails. |
| Dynamic global throttle | yes | Test closed-loop stabilization while preserving update geometry. |
| Loose kappa + throttle | yes | Test static safety rails plus dynamic control. |
| Global static kappa scale | yes | Test global gain control without row/layer direction changes. |
| Legacy row/column projection | optional | Compare against the old mechanism only if available or cheap to stub. |
The key comparison is baseline fixed-point versus dynamic global throttle. Legacy row/column projection is useful for diagnosing direction distortion, but it is secondary.
Required Logs
Each run should produce machine-readable logs and notebook plots for:
- loss before and after drift,
- output error before and after drift,
- global and per-layer weight norms,
- global and per-layer gradient norms,
- global and per-layer update norms,
- curvature proxy
C(t), - EMA instability signal
S(t), - global throttle
alpha(t), - update cosine between actual update and
-G, - activation min/max/percentiles per layer,
- gradient min/max/percentiles per layer,
- fixed-point saturation counts per tensor,
- rail pressure fractions per tensor,
- product gain or approximate forward gain,
- optional Hessian norm,
lambda_max(H),rho(I - eta H), andrho(I - alpha eta H).
The update cosine is important because it directly measures whether budgeting preserves descent direction:
Values near 1 mean budgeting mostly rescales the update. Lower or negative values mean budgeting has substantially changed the direction.
For dynamic global throttle alone, this value should stay near 1. If it does not, the fixed-point path or saturation logic is changing the update.