dmarx/gaussian_annulus_for_3b1b.md

## gaussian_annulus_for_3b1b.md

      
    Raw
  

              gaussian_annulus_for_3b1b.md
            
          
    This feedback was motivated by your lecture: "On the volumes of higher-dimensional spheres"
First, I just want to say I love that you made this video. I think you brush off the gaussian discussion a bit too quickly
though. In particular, I think an aside on the gaussian perspective gives a stronger intuition about how the concentration of
measure works. In the video, you proposed that the measure concentrates at the "equator". Personally, I consider it pretty hard to reason about what the "equator" of a high dimensional sphere is.
Instead, I think a much easier way to think about this is that the volume concentrates along the circumference. Concretely,
rather than the equator of a sphere, the visual you should be motivating is a thin shell at the surface of the sphere. In high dimensions, that "equator" is really any great circle wrt the origin, which sweeps out the shell.
I've heard this referred to as "the gaussian annulus theorem". Not sure who to attribute it to.
Taking this back to the ML perspective: rather than worrying about volume, think about vectors. Given some vector sampled
from a multi-variate standard normal, what is its expected magnitude? sweeping this vector about the origin gives us how the
measure concentrates. To make this more concrete, we can threshold at a credible interval.
Plot the coordinates of these vectors in one and two dimensions, you can see how the center of mass is at the origin. Just as
we'd expect for a normal distribution. But take the absolute value, and notice how the mass shifts (when we take the magnitude,
it's essentially a fancy absolute value. L1 norm vs L2 norm). Even in two dimensions, we can see that the magnitude is
already pushed away from the origin. Which makes sense: it would be sort of weird if the expected magnitude of a random vector
was zero.
As we increase dimension, not only does the center of mass shift further right (and eventually converging), importantly: the
credible interval gets tighter and tighter. Depending on how you set it, at 4D it probably already doesn't include the origin.
THE CENTER OF MASS IS OUT OF DISTRIBUTION. We'll get back to this.
So as you increase dimension, we sweep this vector. It's hard to visualize high dimensional spheres (espeically imho when
we treat them as stacked domains of lower dimensional spheres) but a high dimensional vector is still just a vector. The
high dimensionality is really just a statement about the degrees of freedom with which it can rotate. Regardless, wrt every
degree of freedom we might try to sweep it, it is still an arrow fixed at the origin whose magnitude is given by the random
variable we're playing with. Above d=10 or so, as you illustrated in your volume video, the measure concentrates on an
infinitessimally thin region, corresponding to the expected value of our vector with trivially tight error bounds.
Now, let's revisit the significance of this mass living on a shell.
The parable of the "average pilot"

Most of the components of any sampled vector should take values near zero. That's (basically) what it means for a random variable to have a median of zero, which is a property baked into the parameterization of the 1D standard normal. As the number of dimensions goes up, this manifests as an increase in the number of vector components. Consequently, we expect normally distributed random variables in high dimensions to be sparse. Most of the mass is going to be concentrated in a minority of components, and the majority of components will have basically no mass.
Philosophically, this is equivalent to asserting that everyone is weird. The more attributes we consider, the less likely we
are to find someone who is "average" in all of them. The US airforce learned this the hard way when they tried to design an
airplane cockpit for "the average pilot", only to learn that this person didn't exist and they needed to make the seats
adjustable. https://www.thestar.com/news/insight/when-u-s-air-force-discovered-the-flaw-of-averages/article_e3231734-e5da-5bf5-9496-a34e52d60bd9.html
A consequence of this is that high dimensional interpolations need to be spherical. Linear interpolation will take you on an arc that goes out of distribution quickly and then returns to in-distribution shortly before arriving at your destination.
A great way to visualize this is with an AI image generator, and interpolate between two (text) conditioning vectors.
No results found