The bell curve is one of the first things we learn in probability. Gaussian samples concentrate near the mean, density peaks at zero, tails fall off fast. In one dimension this picture is correct. In a hundred dimensions, it is catastrophically wrong.
If you sample from a standard Gaussian in dimensions, almost none of your samples will be near the origin. Almost all of them will sit on a thin spherical shell of radius about .
Density vs. mass
The standard -dimensional Gaussian has density
which peaks at . So far, so 1D.
But probability mass is density times volume. And in high dimensions, the volume sitting at large radii dwarfs the volume near the origin. To see why, consider a thin spherical shell at radius with thickness . Its volume is surface area thickness. In 3D, the surface area of a sphere of radius scales as ; in dimensions it scales as (one fewer power than the dimension, because the sphere is one dimension lower than the ambient space). So the shell volume scales like .
The factor is what dominates. Doubling the radius multiplies the shell’s volume by , which is about for . Concretely: the fraction of a unit ball lying inside radius is , which is around at and effectively at . Almost all of a high-dim region’s volume is in its outermost sliver.
To find where mass concentrates, multiply density by shell volume. The result is the radial density: the density of the random variable , i.e., the probability mass that lands in the thin shell at radius .
This is the chi distribution with degrees of freedom (just a name for the law of when is a standard -dim Gaussian). The Gaussian factor wants small; the volume factor wants large. They balance at , which is the mode of the radial distribution.
Slide and watch the cloud collapse onto the sphere. Each blue dot is one sample from , plotted at normalized radius at a uniformly random angle. With this scaling the predicted shell sits at radius for every (the dashed red ring), so the ring stays put and only the cloud moves. At the points are scattered all over: some near the origin, some past the ring, almost none on it. As grows, the relative spread shrinks, and the cloud collapses onto the ring.
The numbers
If , then is a sum of independent squared standard normals, which is by definition a chi-squared distribution with degrees of freedom. Its mean and variance are
To go from to , take a square root. The delta method (a Taylor-expansion shortcut: for a smooth function , ) applied to gives
and so
for any . The standard deviation is constant in . The mean grows as . So the relative spread
tends to zero. The shell width stays around as grows; the shell radius marches outward. In , the shell is at radius with thickness . In , radius with thickness still .
For sufficiently large , a Gaussian sample is essentially uniformly distributed on the sphere of radius .
What does this mean?
The mode of the joint density is the origin. The mode of the radial density is . Both facts are correct simultaneously. The lesson:
In high dimensions, density is not the right intuition. Mass is.
A point at the origin has the maximum density, but the volume of a small ball around it is tiny. A point on the shell has exponentially smaller density, but the surface area at that radius is exponentially huge. The product wins for the shell.
Implications
A few non-trivial consequences:
- Generative models. Models that map noise to data (GANs, VAEs, diffusion models) learn how to take samples on the input shell at and place them somewhere meaningful. Decoding from a sample near the origin is asking the network to handle a region it almost never saw during training.
- Sampling and MCMC. A lot of inference algorithms have to find the typical set of the target distribution, which lives on a thin shell. Algorithms that wander far from the shell waste samples on regions with negligible mass.
- Random projections. A uniformly random unit vector in projected onto a single axis gives a value with std . High-dim spheres look Gaussian in low-dim projections, which connects this story back to the central limit theorem.
- Concentration of measure. The Gaussian shell is one face of a much more general fact: in high dimensions, Lipschitz functions of independent variables concentrate sharply around their median. Lévy’s lemma on the sphere is the same idea in spherical clothing.
Why this gets missed
The 1D bell curve is so canonical that it overwrites our intuition for higher dimensions. Even researchers in high-dimensional probability slip into mass near the mean thinking when working informally. The right picture: in dimensions, a typical Gaussian sample is at distance from the mean. The mean itself is essentially never sampled.
References
- Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018. Chapter 3.
- Avrim Blum, John Hopcroft, Ravi Kannan. Foundations of Data Science. Cambridge University Press, 2020. Chapter 2.
- Michel Talagrand. Concentration of Measure and Isoperimetric Inequalities in Product Spaces. Publications mathématiques de l’IHÉS, 1995.