Tag: machine learning
All the articles with the tag "machine learning".
-
Shattering and the VC dimension
Posted on:How expressive is a family of yes/no rules? Shattering measures it on a fixed set of points: the class can cut them apart in every conceivable way. The VC dimension is the largest set it can shatter, a single integer. Past that size the number of labelings stops doubling and is trapped under a polynomial, the Sauer-Shelah jump, and that one integer is what makes an infinite class learnable.
-
The Hanson–Wright inequality
Posted on:A quadratic form in independent random variables concentrates around its mean, the trace of the matrix. Hanson–Wright pins down the two-sided tail: a Gaussian regime near the mean set by the Frobenius norm, crossing over to a heavier exponential tail set by the operator norm. For Gaussian inputs it all follows from rotating to the eigenbasis and applying Bernstein.
-
Covering the sphere with ε-nets
Posted on:Many quantities in high dimensions are a supremum over all unit vectors, and a union bound cannot reach infinitely many of them. An ε-net replaces the sphere by a finite set of directions: control the quantity there, pay a constant factor to extend back, and the exponentially large net is still no match for a sub-Gaussian tail. The worked example is the operator norm of a random matrix.
-
Sudakov minoration, or how big a maximum must be
Posted on:Averages shrink, but maxima grow. Sudakov's minoration inequality is the clean tool for the harder direction: a lower bound on the expected maximum of many Gaussians. As long as no two of them are too alike, that maximum is at least ε times the square root of log N. This is the engine behind a lot of impossibility proofs.
-
Bias-variance is a Pythagorean decomposition
Posted on:The textbook derivation of MSE = bias² + variance reads as a cancellation in algebra. Geometrically, it is the Pythagorean theorem in L². A constant bias and a mean-zero residual are orthogonal, and their squared lengths add. The bias-variance tradeoff lives on a right triangle.
-
Voronoi tessellations and Lloyd's algorithm
Posted on:A set of generators in the plane partitions it into regions, each closer to one generator than to any other. Lloyd's algorithm iterates "move each generator to the centroid of its region" and converges to a centroidal Voronoi tessellation (k-means).
-
Optimal message passing on sparse graphs
Posted on:A condensed walkthrough of our NeurIPS 2023 paper deriving the asymptotically Bayes-optimal classifier for node classification on sparse contextual stochastic block models, and what it implies for the design of graph neural networks.
-
Marchenko-Pastur and the Wigner semicircle
Posted on:The eigenvalues of a large random matrix do not scatter around. They concentrate, as a histogram, on a deterministic shape. For sample covariance matrices the shape is Marchenko-Pastur; for symmetric matrices with i.i.d. entries it is the Wigner semicircle. Both shapes are computable, and they explain precisely why high-dimensional covariance estimation is biased.