Blog
No posts match that tag.
-
Voronoi tessellations and Lloyd's algorithm
Posted on:A set of generators in the plane partitions it into regions, each closer to one generator than to any other. Lloyd's algorithm iterates "move each generator to the centroid of its region" and converges to a centroidal Voronoi tessellation. The same algorithm is k-means.
-
Pólya's recurrence theorem
Posted on:Simple random walk on the integer lattice returns to the origin with probability one in 1D and 2D. In 3D and higher, there is a positive probability of never returning. The transition is exact, dimension-dependent, and reduces to convergence of a single harmonic-style series.
-
High-dimensional Gaussians live on a sphere
Posted on:The bell-curve picture says Gaussian samples live near the mean. In high dimensions that picture is catastrophically wrong: almost all the mass lies in a thin spherical shell at radius √d. Density and mass are not the same thing.
-
Central Limit Theorem - why sums become Gaussian
Posted on:A geometric look at the central limit theorem. Adding random variables is convolving their densities. Convolution smooths. Watch a Bernoulli, a die roll, or a bimodal distribution become Gaussian as you slide the number of summands.
-
Why hypercubes look spiky
Posted on:Counting the corners of an n-dimensional cube by Hamming weight gives a binomial distribution. Plot it as a vertical cross-section and you recover the spiky cube shape that high-dimensional textbooks love to draw.
-
The 100 prisoners problem
Posted on:100 prisoners must each find their own number among 100 randomly filled boxes, opening at most 50 each. Random guessing succeeds with probability one in a nonillion. A particular cycle-following strategy succeeds about 31% of the time. The reason is the cycle structure of a random permutation.
-
Optimal message passing on sparse graphs
Posted on:A condensed walkthrough of our NeurIPS 2023 paper deriving the asymptotically Bayes-optimal classifier for node classification on sparse contextual stochastic block models, and what it implies for the design of graph neural networks.
-
Marchenko-Pastur and the Wigner semicircle
Posted on:The eigenvalues of a large random matrix do not scatter around. They concentrate, as a histogram, on a deterministic shape. For sample covariance matrices the shape is Marchenko-Pastur; for symmetric matrices with i.i.d. entries it is the Wigner semicircle. Both shapes are computable, and they explain precisely why high-dimensional covariance estimation is biased.
-
Stein's paradox
Posted on:In three or more dimensions, the sample mean is dominated everywhere by a shrinkage estimator. The geometric reason is the Gaussian shell: noise pushes you outward, and pulling back is uniformly better. A precursor of ridge regression and most modern regularization.
-
The Newton fractal
Posted on:Newton's method for finding roots of polynomials is a discrete dynamical system on the complex plane. Each starting point converges (almost always) to one of the roots. The basins of attraction have fractal boundaries, intricate to a degree the algebra of the polynomial gives no hint of.
-
Nearest neighbor breaks in high dimensions
Posted on:In high dimensions, all pairwise distances become essentially equal. Nearest and farthest neighbor are no longer meaningfully different. A short geometric tour of the curse of dimensionality.
-
Effects of graph convolutions in multi-layer networks
Posted on:A walkthrough of our ICLR 2023 paper on how graph convolutions provably lower the feature-signal threshold for node classification in contextual stochastic block models, and why two convolutions help much more than one.
-
Fast and online palindrome counting
Posted on:An exploration of an efficient algorithm for online palindrome counting using a palindrome tree data structure. Based on the work of Rubinchik and Shur, this post details the problem, the data structure, and the implementation.
-
Lunch with Donald Knuth
Posted on:Reflections on a lunch meeting with Donald Knuth. Covers his thoughts on P vs NP, advice on life and curiosity, and his recent mathematical interests in families of sets.
-
An analogy for the Doppler effect
Posted on:A simple analogy to explain the Doppler effect using the concept of throwing balls between two people. Designed to make the physics concept intuitive and accessible to a layperson.
-
St. Petersburg paradox
Posted on:An analysis of the St. Petersburg paradox, where the expected winning value is infinite. Discusses the conflict between mathematical expectation and intuition, and resolves it using practical constraints.
-
Common join algorithms
Posted on:An overview of common join algorithms used in database systems, including Nested Loop, Hash Join, and Sort-Merge Join. Explains the logic, implementation details, and time complexities of each.
-
Implementing PEGASOS
Posted on:A detailed guide on implementing PEGASOS (Primal Estimated sub-GrAdient SOlver for SVM). Explains the mathematical derivation, the stochastic gradient descent approach, and the algorithm's steps.
-
Weird but awesome Javascript
Posted on:A deep dive into the Javascript runtime environment. Explains the single-threaded nature, the call stack, the event loop, the callback queue, and how they work together to handle asynchronous operations.
-
Some more #P complete problems
Posted on:A discussion on #P-Complete problems, specifically #SAT and counting 3-colorings in a graph. Includes proofs of their completeness and relevant reductions.
-
The complexity class of #P problems
Posted on:An in-depth look at the complexity class #P, based on Valiant's work. Focuses on the problem of computing the permanent of a binary matrix.
-
Machine level obfuscation
Posted on:A demonstration of how to obfuscate strings in C using floating-point numbers and machine endianness. Includes a code example that prints "ILOVEYOU" through seemingly random double values.
-
First interview experience
Posted on:A personal account of my first face-to-face interview experience at Microsoft. Covers the written rounds, technical questions on linked lists and binary trees, and the final interview rounds.