Skip to content

High-dimensional Gaussians live on a sphere

Posted on:

The bell curve is one of the first things we learn in probability. Gaussian samples concentrate near the mean, density peaks at zero, tails fall off fast. In one dimension this picture is correct. In a hundred dimensions, it is catastrophically wrong.

If you sample from a standard Gaussian in d=100d = 100 dimensions, almost none of your samples will be near the origin. Almost all of them will sit on a thin spherical shell of radius about d=10\sqrt{d} = 10.

Density vs. mass

The standard dd-dimensional Gaussian has density

f(x)  =  (2π)d/2exp ⁣(12x2),f(x) \;=\; (2\pi)^{-d/2} \exp\!\left(-\tfrac{1}{2}\|x\|^2\right),

which peaks at x=0x = 0. So far, so 1D.

But probability mass is density times volume. And in high dimensions, the volume sitting at large radii dwarfs the volume near the origin. To see why, consider a thin spherical shell at radius rr with thickness drdr. Its volume is surface area ×\times thickness. In 3D, the surface area of a sphere of radius rr scales as r2r^2; in dd dimensions it scales as rd1r^{d-1} (one fewer power than the dimension, because the sphere is one dimension lower than the ambient space). So the shell volume scales like rd1drr^{d-1}\, dr.

The factor rd1r^{d-1} is what dominates. Doubling the radius multiplies the shell’s volume by 2d12^{d-1}, which is about 6×10296 \times 10^{29} for d=100d = 100. Concretely: the fraction of a unit ball lying inside radius 0.990.99 is 0.99d0.99^d, which is around 37%37\% at d=100d = 100 and effectively 0%0\% at d=1000d = 1000. Almost all of a high-dim region’s volume is in its outermost sliver.

To find where mass concentrates, multiply density by shell volume. The result is the radial density: the density of the random variable X\|X\|, i.e., the probability mass that lands in the thin shell at radius rr.

p(r)    rd1exp ⁣(12r2).p(r) \;\propto\; r^{d-1} \exp\!\left(-\tfrac{1}{2} r^2\right).

This is the chi distribution with dd degrees of freedom (just a name for the law of X\|X\| when XX is a standard dd-dim Gaussian). The Gaussian factor exp(r2/2)\exp(-r^2/2) wants rr small; the volume factor rd1r^{d-1} wants rr large. They balance at r=d1dr = \sqrt{d-1} \approx \sqrt{d}, which is the mode of the radial distribution.

Slide dd and watch the cloud collapse onto the sphere. Each blue dot is one sample from N(0,Id)\mathcal{N}(0, I_d), plotted at normalized radius X/d\|X\|/\sqrt{d} at a uniformly random angle. With this scaling the predicted shell sits at radius 11 for every dd (the dashed red ring), so the ring stays put and only the cloud moves. At d=1d = 1 the points are scattered all over: some near the origin, some past the ring, almost none on it. As dd grows, the relative spread std(X)/E[X]1/2d\mathrm{std}(\|X\|)/\mathbb{E}[\|X\|] \approx 1/\sqrt{2d} shrinks, and the cloud collapses onto the ring.

d = 3d = 1.73mean ‖X‖ = 1.60std ‖X‖ = 0.67

The numbers

If XN(0,Id)X \sim \mathcal{N}(0, I_d), then X2=X12+X22++Xd2\|X\|^2 = X_1^2 + X_2^2 + \cdots + X_d^2 is a sum of dd independent squared standard normals, which is by definition a chi-squared distribution with dd degrees of freedom. Its mean and variance are

E[X2]=d,Var(X2)=2d.\mathbb{E}[\|X\|^2] = d, \qquad \mathrm{Var}(\|X\|^2) = 2d.

To go from X2\|X\|^2 to X\|X\|, take a square root. The delta method (a Taylor-expansion shortcut: for a smooth function gg, Var(g(Z))g(E[Z])2Var(Z)\mathrm{Var}(g(Z)) \approx g'(\mathbb{E}[Z])^2 \cdot \mathrm{Var}(Z)) applied to g(z)=zg(z) = \sqrt{z} gives

Var(X)    Var(X2)4E[X2]  =  2d4d  =  12,\mathrm{Var}(\|X\|) \;\approx\; \frac{\mathrm{Var}(\|X\|^2)}{4\,\mathbb{E}[\|X\|^2]} \;=\; \frac{2d}{4d} \;=\; \frac{1}{2},

and so

E[X]d,std(X)120.707\mathbb{E}[\|X\|] \approx \sqrt{d}, \qquad \mathrm{std}(\|X\|) \approx \frac{1}{\sqrt{2}} \approx 0.707

for any dd. The standard deviation is constant in dd. The mean grows as d\sqrt{d}. So the relative spread

std(X)E[X]    12d\frac{\mathrm{std}(\|X\|)}{\mathbb{E}[\|X\|]} \;\approx\; \frac{1}{\sqrt{2d}}

tends to zero. The shell width stays around 0.70.7 as dd grows; the shell radius marches outward. In d=100d = 100, the shell is at radius 10\sim 10 with thickness 0.7\sim 0.7. In d=10,000d = 10{,}000, radius 100\sim 100 with thickness still 0.7\sim 0.7.

For sufficiently large dd, a Gaussian sample is essentially uniformly distributed on the sphere of radius d\sqrt{d}.

What does this mean?

The mode of the joint density is the origin. The mode of the radial density is d\sqrt{d}. Both facts are correct simultaneously. The lesson:

In high dimensions, density is not the right intuition. Mass is.

A point at the origin has the maximum density, but the volume of a small ball around it is tiny. A point on the d\sqrt{d} shell has exponentially smaller density, but the surface area at that radius is exponentially huge. The product wins for the shell.

Implications

A few non-trivial consequences:

  1. Generative models. Models that map noise N(0,Id)\mathcal{N}(0, I_d) to data (GANs, VAEs, diffusion models) learn how to take samples on the input shell at d\sqrt{d} and place them somewhere meaningful. Decoding from a sample near the origin is asking the network to handle a region it almost never saw during training.
  2. Sampling and MCMC. A lot of inference algorithms have to find the typical set of the target distribution, which lives on a thin shell. Algorithms that wander far from the shell waste samples on regions with negligible mass.
  3. Random projections. A uniformly random unit vector in Rd\mathbb{R}^d projected onto a single axis gives a value with std 1/d\sim 1/\sqrt{d}. High-dim spheres look Gaussian in low-dim projections, which connects this story back to the central limit theorem.
  4. Concentration of measure. The Gaussian shell is one face of a much more general fact: in high dimensions, Lipschitz functions of independent variables concentrate sharply around their median. Lévy’s lemma on the sphere is the same idea in spherical clothing.

Why this gets missed

The 1D bell curve is so canonical that it overwrites our intuition for higher dimensions. Even researchers in high-dimensional probability slip into mass near the mean thinking when working informally. The right picture: in dd dimensions, a typical Gaussian sample is at distance d\sqrt{d} from the mean. The mean itself is essentially never sampled.

References



Previous Post
Nearest neighbor breaks in high dimensions
Next Post
Central Limit Theorem - why sums become Gaussian