Covering the sphere with ε-nets

A lot of quantities in high-dimensional probability are a supremum over the unit sphere, a worst case over all directions at once. The standard example is the operator norm of a matrix $A$ ,

\|A\| \;=\; \sup_{\|x\| = 1} \|Ax\|,

the largest factor by which $A$ can stretch a unit vector, equivalently its largest singular value. When $A$ is random, we frequently need to show $\|A\|$ is not too large.

For a single fixed direction $x$ , this is routine: $\|Ax\|$ is the length of one random vector, and it concentrates. The difficulty is the word every. The sphere holds infinitely many directions, so we cannot union-bound over them one at a time. The ε-net is the standard device for exactly this situation, and it is one of the most reused arguments in the area.

Two facts do all the work. The sphere has a finite net that is not too large, and controlling a quantity on the net controls it everywhere. After that, the operator norm of a random matrix falls out in a few lines.

What an ε-net is

Fix a target accuracy $\varepsilon > 0$ . An ε-net of the unit sphere $S^{d-1}$ is a finite set of points $\mathcal{N} \subseteq S^{d-1}$ such that every point of the sphere is within distance $\varepsilon$ of some point of $\mathcal{N}$ :

\text{for every } x \in S^{d-1}, \qquad \min_{y \in \mathcal{N}} \|x - y\| \;\le\; \varepsilon.

Said another way, the $\varepsilon$ -balls centered at the net points cover the sphere. A smaller $\varepsilon$ is a finer net and takes more points.

The unit circle $S^1$ is the one case we can draw. Below, each dot is a net point and the shaded disk around it has radius $\varepsilon$ . The disks together cover the circle, so every point on it is within $\varepsilon$ of a dot. Slide $\varepsilon$ to make the net finer or coarser.

The size of the net

The net is finite, but its size is the question that decides everything. That size is the covering number $N(S^{d-1}, \varepsilon)$ , and a short volume argument bounds it.

Take $\mathcal{N}$ to be a maximal $\varepsilon$ -separated subset of the sphere: a set of points that are pairwise at least $\varepsilon$ apart, to which no further point can be added without breaking that separation. Maximality forces $\mathcal{N}$ to be an $\varepsilon$ -net, because if some point of the sphere were farther than $\varepsilon$ from all of $\mathcal{N}$ , we could add it. Now place a ball of radius $\varepsilon/2$ around each net point. These balls are disjoint, since their centers are at least $\varepsilon$ apart, and they all sit inside the ball of radius $1 + \varepsilon/2$ around the origin, since each center has norm $1$ . A $d$ -dimensional ball of radius $r$ has volume proportional to $r^d$ , so comparing the total volume of the small balls to the big one,

|\mathcal{N}| \cdot \left(\tfrac{\varepsilon}{2}\right)^{d} \;\le\; \left(1 + \tfrac{\varepsilon}{2}\right)^{d}, \qquad\text{hence}\qquad |\mathcal{N}| \;\le\; \left(1 + \frac{2}{\varepsilon}\right)^{d} \;\le\; \left(\frac{3}{\varepsilon}\right)^{d}

for $\varepsilon \le 1$ .

So the sphere has an $\varepsilon$ -net of size at most $(3/\varepsilon)^d$ . It is finite, which is what we needed, but it grows exponentially in the dimension $d$ . That exponential is the quantity to watch: whether the net is affordable comes down to whether the $e^{O(d)}$ point count can be beaten by the tail bound we have for a single point.

From the net to the whole sphere

A net is only useful if controlling a quantity on it controls the quantity everywhere. For the operator norm, the extension is one short computation, and it costs only a constant factor.

Extension lemma. Let $\mathcal{N}$ be an $\varepsilon$ -net of $S^{n-1}$ . Then
$\|A\| \;\le\; \frac{1}{1 - \varepsilon} \max_{x \in \mathcal{N}} \|Ax\|.$

Take any unit vector $z$ , and a net point $x$ with $\|z - x\| \le \varepsilon$ . Writing $M = \max_{x \in \mathcal{N}} \|Ax\|$ ,

\|Az\| \;\le\; \|Ax\| + \|A(z - x)\| \;\le\; M + \|A\|\,\|z - x\| \;\le\; M + \varepsilon \|A\|,

using $\|A(z-x)\| \le \|A\|\,\|z-x\|$ . Taking the supremum over unit $z$ on the left gives $\|A\| \le M + \varepsilon\|A\|$ , and rearranging gives the lemma.

With $\varepsilon = \tfrac{1}{2}$ the factor is $2$ and the net has at most $5^n$ points:

\|A\| \;\le\; 2 \max_{x \in \mathcal{N}} \|Ax\|, \qquad |\mathcal{N}| \le 5^{n}.

The supremum over the whole sphere that defines $\|A\|$ is now a maximum over a finite set, paid for with a factor of $2$ .

The operator norm of a random matrix

Now the application. Let $A$ be an $m \times n$ matrix with independent standard Gaussian entries, $A_{ij} \sim \mathcal{N}(0,1)$ . How large is $\|A\|$ ?

One fixed direction. Fix a unit vector $x \in \mathbb{R}^n$ . Each coordinate of $Ax$ is the inner product of a row of $A$ with $x$ , a Gaussian of variance $\|x\|^2 = 1$ , and the rows are independent, so $Ax \sim \mathcal{N}(0, I_m)$ . Then $\|Ax\|^2$ is a sum of $m$ squared standard Gaussians, a chi-squared with mean $m$ , so by Jensen’s inequality ( $\mathbb{E}\sqrt{Z} \le \sqrt{\mathbb{E}Z}$ for the concave square root) $\mathbb{E}\|Ax\| \le \sqrt{m}$ . Moreover $\|Ax\|$ is a $1$ -Lipschitz function of the entries of $A$ (changing $A$ by a small amount in Frobenius norm changes $\|Ax\|$ by at most as much), so Gaussian concentration gives a sub-Gaussian tail:

\Pr\big[\|Ax\| \ge \sqrt{m} + s\big] \;\le\; e^{-s^2/2}.

Union over the net. This was one fixed $x$ . Take a $\tfrac{1}{2}$ -net $\mathcal{N}$ of the sphere $S^{n-1}$ , with $|\mathcal{N}| \le 5^{n}$ , and union-bound the tail over its points:

\Pr\Big[\max_{x \in \mathcal{N}} \|Ax\| \ge \sqrt{m} + s\Big] \;\le\; 5^{n}\, e^{-s^2/2}.

Extend. The extension lemma turns the net maximum into the full operator norm, $\|A\| \le 2 \max_{x \in \mathcal{N}} \|Ax\|$ , so

\Pr\big[\|A\| \ge 2(\sqrt{m} + s)\big] \;\le\; 5^{n}\, e^{-s^2/2}.

Balance. The net contributes $5^{n} = e^{n \ln 5}$ , and the tail contributes $e^{-s^2/2}$ . The tail overtakes the net once $s$ is a little past $\sqrt{2 \ln 5}\,\sqrt{n} \approx 1.8\sqrt{n}$ : setting $s = \sqrt{2 \ln 5}\,\sqrt{n} + t$ makes $s^2 \ge 2n\ln 5 + t^2$ , so

5^{n}\, e^{-s^2/2} \;\le\; e^{-t^2/2}.

Therefore, with probability at least $1 - e^{-t^2/2}$ ,

\|A\| \;\le\; 2\sqrt{m} + 2\sqrt{2\ln 5}\,\sqrt{n} + 2t \;\lesssim\; \sqrt{m} + \sqrt{n} + t.

A random $m \times n$ Gaussian matrix has $\|A\| \lesssim \sqrt{m} + \sqrt{n}$ , with fluctuations of constant order. The $\varepsilon$ -net did the real work: it converted the supremum over the sphere into a $5^{n}$ -fold union bound, and the $5^{n}$ was harmless because the per-direction tail $e^{-s^2/2}$ is so much stronger. Spending a deviation of order $\sqrt{n}$ pays off the entire net.

The balance is the whole content of the method. Below, the curve is the exponent $n \ln 5 - s^2/2$ of the bound $5^{n} e^{-s^2/2}$ . Where it is positive the bound exceeds $1$ and says nothing; once it crosses below zero the bound is exponentially small. Slide $n$ and watch the crossover $s^\star = \sqrt{2n\ln 5}$ move outward like $\sqrt{n}$ .

The constants are loose. The factor $2$ is the price of the $\tfrac12$ -net, and the $\sqrt{2\ln 5}$ is the price of the volume bound on the net size. Sharper arguments remove them: Gordon’s Gaussian comparison inequality gives $\mathbb{E}\|A\| \le \sqrt{m} + \sqrt{n}$ with no stray constant, and the Bai–Yin law shows $\|A\|/(\sqrt{m} + \sqrt{n}) \to 1$ as the dimensions grow. The $\varepsilon$ -net gets the rate $\sqrt{m} + \sqrt{n}$ right with almost no work, which is why it is usually the first thing to try.

When the net trick works

The pattern reaches well beyond random matrices. Whenever a quantity is a supremum over the sphere, or over any set with a controlled covering number, and each fixed point obeys a sub-Gaussian tail $e^{-c s^2}$ , the same three steps apply: net the set, union-bound over the net, extend back. The net costs $e^{O(d)}$ , and a sub-Gaussian tail pays for it with a deviation of order $\sqrt{d}$ . This is the argument behind operator-norm bounds for random matrices, the restricted isometry property in compressed sensing, sample covariance estimation, and Johnson–Lindenstrauss sketching.

It also has clear limits. The $e^{O(d)}$ net size means the per-point tail must be at least sub-Gaussian for the method to break even; for heavier tails the net is too expensive and one needs finer tools like chaining. And the covering-number bound discards the geometry of the set, so it gives the right rate but not the sharp constant. When the constant is what matters, the net is the wrong instrument.

References

Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018. Chapter 4 builds nets and covering numbers and uses them to bound the operator norm of a random matrix.
Martin J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press, 2019. Chapter 6 covers random matrices and covariance estimation, including the operator-norm bound; Chapter 5 develops the metric-entropy machinery behind it.
Terence Tao. Topics in Random Matrix Theory. American Mathematical Society, 2012. The $\varepsilon$ -net argument for the operator norm appears in the opening sections on the operator norm.
Yehoram Gordon. Some inequalities for Gaussian processes and applications. Israel Journal of Mathematics, 50(4):265–289, 1985. The Gaussian comparison behind the sharp $\sqrt{m} + \sqrt{n}$ constant.
Z. D. Bai, Y. Q. Yin. Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. The Annals of Probability, 21(3):1275–1294, 1993. The exact limiting values of the extreme singular values that the $\varepsilon$ -net bound approximates.