🌀 1-Lipschitz and stability
1. The original idea of -per layer
The original -budgeting idea is about controlling the network gain. For a network:
we want a Lipschitz bound:
Here:
- are two inputs,
- is the network output,
- is the Lipschitz constant.
For a purely linear network:
the exact input-output gain is:
Using submultiplicativity of matrix norms:
So if we enforce:
then:
That is the static -budgeting idea:
Key insight: The original -budgeting idea was about controlling the static Lipschitz gain of the network. This is important for inference-time robustness and numerical stability.
2. The problem of online training
But online training introduces a different map: not only
but also:
The learning rule is itself a dynamical system:
So the stability of online learning is governed by the sensitivity of the gradient field:
A natural smoothness/Lipschitz condition for the gradient is:
Here is the Lipschitz constant of the gradient field. In smooth optimization, this is related to the Hessian norm:
where:
So there are two different Lipschitz ideas:
a. Static network Lipschitzness
This is about input-output gain.
b. Dynamic optimizer Lipschitzness
This is about how violently the gradient changes when the weights move.
-budgeting mostly targets the first. The new throttle targets the second.
Summary box
Key insight: We were controlling the static Lipschitz gain of the network, but the instability during online training is governed by the Lipschitzness of the gradient/update field. The new controller targets that closed-loop sensitivity.