Skip to content

Covering the sphere with ε-nets

Posted on:

A lot of quantities in high-dimensional probability are a supremum over the unit sphere, a worst case over all directions at once. The standard example is the operator norm of a matrix AA,

A  =  supx=1Ax,\|A\| \;=\; \sup_{\|x\| = 1} \|Ax\|,

the largest factor by which AA can stretch a unit vector, equivalently its largest singular value. When AA is random, there are usually situations where we want to claim A\|A\| is not too large, to satisfy some criteria.

For a single fixed direction xx, this is routine: Ax\|Ax\| is the length of one random vector, and it concentrates. The difficulty is the word every. The sphere holds infinitely many directions, so we cannot union-bound over them one at a time. The ε-net is the standard device for exactly this situation, and it is one of the most reused arguments in the area.

Two facts do all the work. The sphere has a finite net that is not too large, and controlling a quantity on the net controls it everywhere. After that, the operator norm of a random matrix falls out in a few lines.

What an ε-net is

Fix a target accuracy ε>0\varepsilon > 0. An ε-net of the unit sphere Sd1S^{d-1} is a finite set of points NSd1\mathcal{N} \subseteq S^{d-1} such that every point of the sphere is within distance ε\varepsilon of some point of N\mathcal{N}:

for every xSd1,minyNxy    ε.\text{for every } x \in S^{d-1}, \qquad \min_{y \in \mathcal{N}} \|x - y\| \;\le\; \varepsilon.

Said another way, the ε\varepsilon-balls centered at the net points cover the sphere. A smaller ε\varepsilon is a finer net and takes more points.

The unit circle S1S^1 is the one case we can draw. Below, each dot is a net point and the shaded disk around it has radius ε\varepsilon. The disks together cover the circle, so every point on it is within ε\varepsilon of a dot. Slide ε\varepsilon to make the net finer or coarser.

ε = 0.50net points = 13

How many points does the net need?

The net is finite, but its size is the question that decides everything. That size is the covering number N(Sd1,ε)N(S^{d-1}, \varepsilon), and a short volume argument bounds it.

Take N\mathcal{N} to be a maximal ε\varepsilon-separated subset of the sphere: a set of points that are pairwise at least ε\varepsilon apart, to which no further point can be added without breaking that separation. Maximality forces N\mathcal{N} to be an ε\varepsilon-net, because if some point of the sphere were farther than ε\varepsilon from all of N\mathcal{N}, we could add it. Now place a ball of radius ε/2\varepsilon/2 around each net point. These balls are disjoint, since their centers are at least ε\varepsilon apart, and they all sit inside the ball of radius 1+ε/21 + \varepsilon/2 around the origin, since each center has norm 11. A dd-dimensional ball of radius rr has volume proportional to rdr^d, so comparing the total volume of the small balls to the big one,

N(ε2)d    (1+ε2)d,henceN    (1+2ε)d    (3ε)d|\mathcal{N}| \cdot \left(\tfrac{\varepsilon}{2}\right)^{d} \;\le\; \left(1 + \tfrac{\varepsilon}{2}\right)^{d}, \qquad\text{hence}\qquad |\mathcal{N}| \;\le\; \left(1 + \frac{2}{\varepsilon}\right)^{d} \;\le\; \left(\frac{3}{\varepsilon}\right)^{d}

for ε1\varepsilon \le 1.

So the sphere has an ε\varepsilon-net of size at most (3/ε)d(3/\varepsilon)^d. It is finite, which is what we needed, but it grows exponentially in the dimension dd. That exponential is the quantity to watch: whether the net is affordable comes down to whether the eO(d)e^{O(d)} point count can be beaten by the tail bound we have for a single point.

From the net to the whole sphere

A net is only useful if controlling a quantity on it controls the quantity everywhere. For the operator norm, the extension is one short computation, and it costs only a constant factor.

Extension lemma. Let N\mathcal{N} be an ε\varepsilon-net of Sn1S^{n-1}. Then

A    11εmaxxNAx.\|A\| \;\le\; \frac{1}{1 - \varepsilon} \max_{x \in \mathcal{N}} \|Ax\|.

Take any unit vector zz, and a net point xx with zxε\|z - x\| \le \varepsilon. Writing M=maxxNAxM = \max_{x \in \mathcal{N}} \|Ax\|,

Az    Ax+A(zx)    M+Azx    M+εA,\|Az\| \;\le\; \|Ax\| + \|A(z - x)\| \;\le\; M + \|A\|\,\|z - x\| \;\le\; M + \varepsilon \|A\|,

using A(zx)Azx\|A(z-x)\| \le \|A\|\,\|z-x\|. Taking the supremum over unit zz on the left gives AM+εA\|A\| \le M + \varepsilon\|A\|, and rearranging gives the lemma.

With ε=12\varepsilon = \tfrac{1}{2} the factor is 22 and the net has at most 5n5^n points:

A    2maxxNAx,N5n.\|A\| \;\le\; 2 \max_{x \in \mathcal{N}} \|Ax\|, \qquad |\mathcal{N}| \le 5^{n}.

The supremum over the whole sphere that defines A\|A\| is now a maximum over a finite set, paid for with a factor of 22.

The operator norm of a random matrix

Now the application. Let AA be an m×nm \times n matrix with independent standard Gaussian entries, AijN(0,1)A_{ij} \sim \mathcal{N}(0,1). How large is A\|A\|?

One fixed direction. Fix a unit vector xRnx \in \mathbb{R}^n. Each coordinate of AxAx is the inner product of a row of AA with xx, a Gaussian of variance x2=1\|x\|^2 = 1, and the rows are independent, so AxN(0,Im)Ax \sim \mathcal{N}(0, I_m). Then Ax2\|Ax\|^2 is a sum of mm squared standard Gaussians, a chi-squared with mean mm, so by Jensen EAxm\mathbb{E}\|Ax\| \le \sqrt{m}. Moreover Ax\|Ax\| is a 11-Lipschitz function of the entries of AA (changing AA by a small amount in Frobenius norm changes Ax\|Ax\| by at most as much), so Gaussian concentration gives a sub-Gaussian tail:

Pr[Axm+s]    es2/2.\Pr\big[\|Ax\| \ge \sqrt{m} + s\big] \;\le\; e^{-s^2/2}.

Union over the net. This was one fixed xx. Take a 12\tfrac{1}{2}-net N\mathcal{N} of the sphere Sn1S^{n-1}, with N5n|\mathcal{N}| \le 5^{n}, and union-bound the tail over its points:

Pr[maxxNAxm+s]    5nes2/2.\Pr\Big[\max_{x \in \mathcal{N}} \|Ax\| \ge \sqrt{m} + s\Big] \;\le\; 5^{n}\, e^{-s^2/2}.

Extend. The extension lemma turns the net maximum into the full operator norm, A2maxxNAx\|A\| \le 2 \max_{x \in \mathcal{N}} \|Ax\|, so

Pr[A2(m+s)]    5nes2/2.\Pr\big[\|A\| \ge 2(\sqrt{m} + s)\big] \;\le\; 5^{n}\, e^{-s^2/2}.

Balance. The net contributes 5n=enln55^{n} = e^{n \ln 5}, and the tail contributes es2/2e^{-s^2/2}. The tail overtakes the net once ss is a little past 2ln5n1.8n\sqrt{2 \ln 5}\,\sqrt{n} \approx 1.8\sqrt{n}: setting s=2ln5n+ts = \sqrt{2 \ln 5}\,\sqrt{n} + t makes s22nln5+t2s^2 \ge 2n\ln 5 + t^2, so

5nes2/2    et2/2.5^{n}\, e^{-s^2/2} \;\le\; e^{-t^2/2}.

Therefore, with probability at least 1et2/21 - e^{-t^2/2},

A    2m+22ln5n+2t    m+n+t.\|A\| \;\le\; 2\sqrt{m} + 2\sqrt{2\ln 5}\,\sqrt{n} + 2t \;\lesssim\; \sqrt{m} + \sqrt{n} + t.

A random m×nm \times n Gaussian matrix has Am+n\|A\| \lesssim \sqrt{m} + \sqrt{n}, with fluctuations of constant order. The ε\varepsilon-net did the real work: it converted the supremum over the sphere into a 5n5^{n}-fold union bound, and the 5n5^{n} was harmless because the per-direction tail es2/2e^{-s^2/2} is so much stronger. Spending a deviation of order n\sqrt{n} pays off the entire net.

The balance is the whole content of the method. Below, the curve is the exponent nln5s2/2n \ln 5 - s^2/2 of the bound 5nes2/25^{n} e^{-s^2/2}. Where it is positive the bound exceeds 11 and says nothing; once it crosses below zero the bound is exponentially small. Slide nn and watch the crossover s=2nln5s^\star = \sqrt{2n\ln 5} move outward like n\sqrt{n}.

n = 100net size 5n = e161crossover s = √(2n ln5) = 17.9

The constants are loose. The factor 22 is the price of the 12\tfrac12-net, and the 2ln5\sqrt{2\ln 5} is the price of the volume bound on the net size. Sharper arguments remove them: Gordon’s Gaussian comparison inequality gives EAm+n\mathbb{E}\|A\| \le \sqrt{m} + \sqrt{n} with no stray constant, and the Bai–Yin law shows A/(m+n)1\|A\|/(\sqrt{m} + \sqrt{n}) \to 1 as the dimensions grow. The ε\varepsilon-net gets the rate m+n\sqrt{m} + \sqrt{n} right with almost no work, which is why it is usually the first thing to try.

When the net trick works

The pattern reaches well beyond random matrices. Whenever a quantity is a supremum over the sphere, or over any set with a controlled covering number, and each fixed point obeys a sub-Gaussian tail ecs2e^{-c s^2}, the same three steps apply: net the set, union-bound over the net, extend back. The net costs eO(d)e^{O(d)}, and a sub-Gaussian tail pays for it with a deviation of order d\sqrt{d}. This is the argument behind operator-norm bounds for random matrices, the restricted isometry property in compressed sensing, sample covariance estimation, and Johnson–Lindenstrauss sketching.

It also has clear limits. The eO(d)e^{O(d)} net size means the per-point tail must be at least sub-Gaussian for the method to break even; for heavier tails the net is too expensive and one needs finer tools like chaining. And the covering-number bound discards the geometry of the set, so it gives the right rate but not the sharp constant. When the constant is what matters, the net is the wrong instrument.

References



Previous Post
The Hanson–Wright inequality
Next Post
Sudakov minoration, or how big a maximum must be