A quadratic form is the expression for a fixed matrix and a random vector with independent, mean-zero entries. The squared length is the case , a sample variance is a quadratic form, and so is the energy that a fixed linear map measures. The question is how tightly concentrates around its mean. The Hanson–Wright inequality is the standard answer, and the answer is not a single Gaussian tail but two regimes.
The mean is the trace
Take the entries to be independent with mean zero and variance . Then , equal to when and otherwise, so only the diagonal of survives the expectation:
So fluctuates around .
The Gaussian case, by rotating to the eigenbasis
Take and symmetric. (Any quadratic form sees only the symmetric part , since , so assuming symmetry loses nothing.) Diagonalize with orthonormal and eigenvalues . The rotated vector is again standard Gaussian, since the standard Gaussian is rotation invariant, and in these coordinates the quadratic form is a plain weighted sum of squares:
Subtracting the mean ,
a sum of independent terms, one per eigenvalue.
Each term is a centered chi-square with one degree of freedom. It is sub-exponential, meaning its tail decays like rather than a Gaussian’s , because squaring a Gaussian fattens the tail. Its moment generating function is
Scaling the -th term by and using independence, the cumulant generating function of the whole sum is at most
where is the squared Frobenius norm and the band comes from needing for every , with the operator norm (the largest singular value).
This is the cumulant bound of a Bernstein-type variable: a Gaussian-like term, but valid only inside a band whose width is set by . Optimizing the Chernoff bound over in that band gives the two regimes. For small the optimal stays inside the band and the bound is Gaussian; for large the optimum is pinned at the edge and the bound is exponential. Together,
That is the Hanson–Wright inequality for Gaussian inputs.
For , , the mean is , and the norms are and , so
Fluctuations of size are Gaussian; past the tail turns exponential. This is the concentration of behind the thin shell in high-dimensional Gaussians: the squared length sits at with fluctuations of order , so the length sits at with fluctuations of order .
Two norms, two regimes
The Frobenius norm is the total variance: for Gaussian . It sets the Gaussian regime near the mean, where the fluctuation is an average over all eigenvalue-terms and the central-limit effect applies.
The operator norm is the single largest weight. The heaviest-tailed term in is the one with the biggest , and it alone carries a sub-exponential tail that no averaging removes. It sets the far tail.
The crossover between the two sits at . Below it the form looks Gaussian; above it a single dominant direction takes over and the tail is exponential. The curve below is the rate that sits in the exponent: the larger the rate, the smaller the tail. The parabola governs small deviations, the line governs large ones, and they switch at . Slide the two norms and watch the crossover move.
What decides which regime matters in practice is the shape of the spectrum. Below, the bars are the eigenvalues , normalized so the largest is . The slider tilts the spectrum from flat, where every direction counts equally, like , to spiky, where one direction dominates. A flat spectrum pushes the crossover far out, so the form is Gaussian over a wide range; a spiky spectrum pulls it in, so the exponential tail takes over early. The ratio that sets the crossover (in units of ) is the stable rank, an effective count of active directions.
The general theorem
For general independent entries the rotation that handled the Gaussian case is not available: only the Gaussian is rotation invariant, so for any other distribution the rotated coordinates , while still sub-Gaussian, are no longer independent, and the clean weighted sum of squares falls apart. The inequality holds anyway. For a random vector with independent, mean-zero, sub-Gaussian entries of sub-Gaussian norm at most (the scale on which the entries’ own tails decay like ),
the same two-regime shape, with tracking how heavy the entries are.
The general proof splits the form along its diagonal,
The diagonal part is a sum of independent variables ; it carries the mean and concentrates by Bernstein’s inequality (the same sub-exponential bound used above, now for a sum rather than after a rotation), no harder than the Gaussian case. The off-diagonal part has mean zero and is the real content of the theorem. Its terms share variables, so they are not independent, and the standard handle is decoupling: replace one copy of by an independent copy , so that conditionally on the off-diagonal sum is a linear form in the independent variables , which sub-Gaussian tools control directly. Carrying that through reproduces the same two norms.
Where it gets used
The inequality is a standard tool wherever a squared length or an energy has to be pinned to its mean.
- Length and distance concentration. around is the case . More generally concentrates around its mean for a fixed map , which is how one shows random features or embeddings preserve magnitudes.
- Random projections. Projecting a fixed vector with a random matrix preserves its length up to a factor . The fluctuation of is a quadratic form, and Hanson–Wright supplies the failure probability that feeds the Johnson–Lindenstrauss lemma.
- Covariance estimation. Controlling , the error of a sample covariance in a fixed direction, is a quadratic form in the data.
- As a lemma. It is the usual way to show is close to its mean for one , then combined with the ε-net argument to make the statement hold uniformly over the whole sphere of directions.
References
- David L. Hanson, Farroll T. Wright. A bound on tail probabilities for quadratic forms in independent random variables. The Annals of Mathematical Statistics, 42(3):1079–1083, 1971. The original.
- Mark Rudelson, Roman Vershynin. Hanson-Wright inequality and sub-gaussian concentration. Electronic Communications in Probability, 18, no. 82, 1–9, 2013. The modern statement with the two norms and a clean proof by decoupling.
- Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018. Chapter 6 states and proves the inequality; the sub-exponential and Bernstein bounds it rests on are developed in Chapter 2.
- Stéphane Boucheron, Gábor Lugosi, Pascal Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013. Background on sub-gamma variables and the Bernstein bound behind the two regimes.