A quadratic form is the expression for a fixed matrix and a random vector with independent, mean-zero entries. The squared length is the case , a sample variance is a quadratic form, and so is the energy that a fixed linear map measures. The question is how tightly concentrates around its mean. The Hanson–Wright inequality is the standard answer, and the answer is not a single Gaussian tail but two regimes.
The mean is the trace
Take the entries to be independent with mean zero and variance . Then , equal to when and otherwise, so only the diagonal of survives the expectation:
So fluctuates around .
The Gaussian case, by rotating to the eigenbasis
Take and symmetric. (Any quadratic form sees only the symmetric part , since , so assuming symmetry loses nothing.) Diagonalize with orthonormal and eigenvalues . The rotated vector is again standard Gaussian, since the standard Gaussian is rotation invariant, and in these coordinates the quadratic form is a plain weighted sum of squares:
Subtracting the mean ,
a sum of independent terms, one per eigenvalue.
Each term is a centered chi-square with one degree of freedom. It is sub-exponential, meaning its tail decays like rather than a Gaussian’s , because squaring a Gaussian fattens the tail. Its moment generating function is
Scaling the -th term by and using independence, the cumulant generating function (the logarithm of the moment generating function) of the whole sum is at most
where is the squared Frobenius norm and the band comes from needing for every , with the operator norm (the largest singular value).
This is the cumulant bound of a Bernstein-type variable: a Gaussian-like term, but valid only inside a band whose width is set by . Optimizing the Chernoff bound (Markov’s inequality applied to ) over in that band gives the two regimes. For small the optimal stays inside the band and the bound is Gaussian; for large the optimum is pinned at the edge and the bound is exponential. Together,
That is the Hanson–Wright inequality for Gaussian inputs.
For , , the mean is , and the norms are and , so
Fluctuations of size are Gaussian; past the tail turns exponential. This is the concentration of behind the thin shell in high-dimensional Gaussians: the squared length sits at with fluctuations of order , so the length sits at with fluctuations of order .
Two norms, two regimes
The Frobenius norm is the total variance: for Gaussian . It sets the Gaussian regime near the mean, where the fluctuation is an average over all eigenvalue-terms and the central-limit effect applies.
The operator norm is the single largest weight. The heaviest-tailed term in is the one with the biggest , and it alone carries a sub-exponential tail that no averaging removes. It sets the far tail.
The crossover between the two sits at . Below it the form looks Gaussian; above it a single dominant direction takes over and the tail is exponential. The curve below is the rate that sits in the exponent: the larger the rate, the smaller the tail. The parabola governs small deviations, the line governs large ones, and they switch at . Slide the two norms and watch the crossover move.
What decides which regime matters in practice is the shape of the spectrum. Below, the bars are the eigenvalues , normalized so the largest is . The slider tilts the spectrum from flat, where every direction counts equally, like , to spiky, where one direction dominates. A flat spectrum pushes the crossover far out, so the form is Gaussian over a wide range; a spiky spectrum pulls it in, so the exponential tail takes over early. The ratio that sets the crossover (in units of ) is the stable rank, an effective count of active directions.
The general theorem
For general independent entries the rotation that handled the Gaussian case is not available: only the Gaussian is rotation invariant, so for any other distribution the rotated coordinates , while still sub-Gaussian, are no longer independent, and the clean weighted sum of squares falls apart. The inequality holds anyway. For a random vector with independent, mean-zero, sub-Gaussian entries of sub-Gaussian norm at most (the scale on which the entries’ own tails decay like ),
the same two-regime shape, with tracking how heavy the entries are.
The general proof splits the form along its diagonal,
The diagonal part is a sum of independent variables ; it carries the mean and concentrates by Bernstein’s inequality (the same sub-exponential bound used above, now for a sum rather than after a rotation), no harder than the Gaussian case. The off-diagonal part has mean zero and is the real content of the theorem. Its terms share variables, so they are not independent, and the standard handle is decoupling: replace one copy of by an independent copy , so that conditionally on the off-diagonal sum is a linear form in the independent variables , which sub-Gaussian tools control directly. Carrying that through reproduces the same two norms.
Where it gets used
The inequality is a standard tool wherever a squared length or an energy has to be pinned to its mean.
- Length and distance concentration. around is the case . More generally concentrates around its mean for a fixed map , which is how one shows random features or embeddings preserve magnitudes.
- Random projections. Projecting a fixed vector with a random matrix preserves its length up to a factor . The fluctuation of is a quadratic form, and Hanson–Wright supplies the failure probability that feeds the Johnson–Lindenstrauss lemma.
- Covariance estimation. Controlling , the error of a sample covariance in a fixed direction, is a quadratic form in the data.
- As a lemma. It is the usual way to show is close to its mean for one , then combined with the ε-net argument to make the statement hold uniformly over the whole sphere of directions.
References
- David L. Hanson, Farroll T. Wright. A bound on tail probabilities for quadratic forms in independent random variables. The Annals of Mathematical Statistics, 42(3):1079–1083, 1971. The original.
- Mark Rudelson, Roman Vershynin. Hanson-Wright inequality and sub-gaussian concentration. Electronic Communications in Probability, 18, no. 82, 1–9, 2013. The modern statement with the two norms and a clean proof by decoupling.
- Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018. Chapter 6 states and proves the inequality; the sub-exponential and Bernstein bounds it rests on are developed in Chapter 2.
- Stéphane Boucheron, Gábor Lugosi, Pascal Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013. Background on sub-gamma variables and the Bernstein bound behind the two regimes.