Skip to main content

🧮 Fixed-point quantization

Eventually, the real hardware update must be quantized to the fixed-point format. This can be modeled as:

θt+1=QΘ[θtQΔ(αtηGt)].\boxed{ \theta_{t+1} = Q_\Theta \left[ \theta_t - Q_\Delta(\alpha_t\eta G_t) \right]. }

Where:

  • QΘQ_\Theta quantizes/clips weights,
  • QΔQ_\Delta quantizes/clips updates,
  • αt\alpha_t is the global throttle.

This introduces quantization error:

ξt=θt+1(θtαtηGt).\xi_t = \theta_{t+1} - (\theta_t-\alpha_t\eta G_t).

Stability analysis (Lyapunov) under quantization

The Lyapunov/descent condition becomes roughly:

Lt+1Ltαtη(1αtηLt2)Gt2+quantization error terms.\mathcal{L}_{t+1}-\mathcal{L}_t \lesssim -\alpha_t\eta \left( 1-\frac{\alpha_t\eta L_t}{2} \right) \|G_t\|^2 + \text{quantization error terms}.

This tells us two things. First, stability requires an upper bound:

αtχηCtctrl.\alpha_t \leq \frac{\chi}{\eta C_t^{\text{ctrl}}}.

Second, fixed-point usefulness requires the update not to underflow. If update quantum is q_\Delta, then approximately:

αtηGtqΔ.\alpha_t\eta\|G_t\| \gtrsim q_\Delta.

So:

αtqΔηGt.\alpha_t \gtrsim \frac{q_\Delta}{\eta\|G_t\|}.

Therefore, useful stable fixed-point learning requires a nonempty interval:

qΔηGt+ϵαtχηCtctrl+ϵ.\boxed{ \frac{q_\Delta}{\eta\|G_t\|+\epsilon} \lesssim \alpha_t \leq \frac{\chi}{\eta C_t^{\text{ctrl}}+\epsilon}. }

Key Insight: Fixed-point precision gives a minimum useful update size. Stability gives a maximum safe update size. Online learning is possible only when these bounds overlap.