Skip to content

Instantly share code, notes, and snippets.

@NicolaBernini
Last active November 20, 2020 06:40
Show Gist options
  • Select an option

  • Save NicolaBernini/9a97a885a76e8e1a52571a9533832235 to your computer and use it in GitHub Desktop.

Select an option

Save NicolaBernini/9a97a885a76e8e1a52571a9533832235 to your computer and use it in GitHub Desktop.
Riemannian Geometry and Machine Learning - An Introduction

Riemannian Geometry and Machine Learning - An Introduction

A summary of the notes I have taken while studying the intersection between these 2 topics

NOTE: This is still work in progress but please follow if you are interested in the topic

@NicolaBernini
Copy link
Author

NicolaBernini commented Aug 2, 2020

Connection between KL Divergence and Fisher Information Matrix

This section is to be intended as a guide to read this more effectively

  • The connection between KL Divergence / Relative Entropy of a pair of PDF and the Fisher Information Matrix of a PDF becomes clear when we focus on the convergence of the 2 PDFs
  • Let's take a pair of PDFs $p(\cdot), q(\cdot)$ and let's assume they are part of the same family $f(\cdot, \theta)$ so they differ only in terms of their parameterization $\theta_{0}, \theta_{1}$
  • Let's consider the case when the 2 PDFs are very similar so we can express this formally as follows: $\theta_{0} - \theta_{1} \rightarrow 0$
  • In this case, let's take $\theta_{0}$ as a reference so $\theta_{1} \rightarrow \theta_{0}$ then the let's change the formalism a little bit $D_{KL, \theta_{0}}(\theta)$
    • NOTE: we can't just express this as a function $\Delta \theta = |\theta_{0} - \theta_{1}|$ as the KL Divergence is not symmetric
  • So $D_{KL, \theta_{0}}(\theta)$ is in in general a non linear function of $\theta$ but as we are interested in the limit of $\lim_{\theta \rightarrow \theta_{0}}$ then we can linearize it with a series expansion
    • NOTE: In this limit the KL Divergence is expected to become more and more symmetric, which makes the arbitrary choice of the reference frame less and less relevant
  • It can be shown the 2nd term of the KL Divergence expansion in this limit is equal to the Fisher Information Matrix of $f(\cdot, \theta)$ PDF
  • As a result of this, we can interpret the Fisher Information Matrix as the Hessian or Curvature of the KL Divergence / Relative Entropy of the 2 PDFs in the limit when they are very close to each other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment