Seven Statistical Mechanics / Bayesian Equations That You Need to Know

Seven Statistical Mechanics / Bayesian Equations That You Need to Know

Essential Statistical Mechanics for Deep Learning

 

If you’re self-studying machine learning, and feel that statistical mechanics is suddenly showing up more than it used to, you’re not alone. Within the past couple of years, statistical mechanics (statistical thermodynamics) has become a more integral topic, along with the Kullback-Leibler divergence measure and several inference methods for machine learning, including the expectation maximization (EM) algorithm along with variational Bayes.

 

Italian-renaissance-border-2-thin

 

Statistical mechanics has always played a strong role in machine learning

Statistical mechanics, underlying methods for expectation maximization (EM) and variational Bayes methods for doing inference, is becoming more important as a part of our machine learning foundation.
Statistical mechanics, underlying methods for expectation maximization (EM) and variational Bayes methods for doing inference, is becoming more important as a part of our machine learning foundation. Image (“Dinosaurier: 4974”) courtesy Unik Dekor, https://www.unikdekor.se/outlet/outlet-rea/dinosaurier/.

Imagine, though, that you’re in a strange, Jurassic-park like landscape, where the world continually changes. Statistical mechanics was important in this early landscape, as it underlay the first neural network innovations: the Hopfield neural network and the Boltzmann machine.

Over time, though, these older neural networks were surrounded by mists and volcanic gas, and became lost to view. The importance of statistical mechanics waned, as we focused on more near-term and straightforward goals – building deep structures (using the tried and true backpropagation rule), building Convolution Neural Networks, and the like.

But as in any rapidly-evolving landscape, things continue to change. While expectation maximization (EM) and variational Bayes methods have been around for a while, they’re like a volcano that’s continuing to grow and erupt. These methods are now dominating our landscape more than ever before. Our world now contains not only the former (and in comparison, relatively simple methods of statistical mechanics, such as the Ising model), but the whole notion of inference now dominates our thinking. Thus, variational Bayes is now the “Mount Everest” peak that many machine learning specialists are seeking to climb.

Inference has become so important that it is driving much of practical AI these days, motivating special-purpose GPUs that can make inference much faster. As Jensen Huang said during his keynote address at the May 2017 NVIDIA GTC about his latest product release, “Volta is groundbreaking work. Its’ incredibly good at training and incredibly good at inferencing,” Jensen says. “Volta and TensorRT are ideal for inferencing.” When inference drives major corporate product releases, we know that it’s important.

That’s why the mathematics underlying inference are suddenly becoming much more obvious in our machine learning landscape.

 

Italian-renaissance-border-2-thin

 

Key Equations in Statistical Mechanics, Bayesian Probability, and Inference

Given the plethora of concepts, papers, tutorials, and other information, it might help us to back up and create a mental model or map of the inference-based landscape in machine learning. The following figure illustrates the key equations.

Seven equations from statistical mechanics and Bayesian probability theory that you need to know, including the Kullback-Leibler divergence and variational Bayes.
Seven equations from statistical mechanics and Bayesian probability theory that you need to know, including the Kullback-Leibler divergence and variational Bayes.

 

Italian-renaissance-border-2-thin

 

Why Learning Statistical Mechanics Is Like Traversing a Mountain Range

Learning statistical mechanics is like traversing a mountain range ... the bigger topics are harder climbs. Image courtesy of Lech Magnuszewski (aka "Citizen Fresh"); http://citizenfresh.deviantart.com/.
Learning statistical mechanics is like traversing a mountain range … the bigger topics are harder climbs. Image courtesy of Lech Magnuszewski (aka “Citizen Fresh”); http://citizenfresh.deviantart.com/.

The seven equations identified in the previous figure are not equally difficult. The ones in the foreground are our older equations; the basics of statistical mechanics. They correspond to Eqns. (1)-(4) in the previous figure. The fundamentals of Bayesian probability (Eqn. (5)) have also been with us for a couple of hundred years.

If we were new to the field, we could master the early rudiments of each of these – given a good text or tutorial – in about a weekend each for statistical mechanics and Bayesian probability.

It would take us more time to understand the Kullback-Leibler divergence (Eqn. (6)). And it could take us several months, or even a year or two, to fully understand the expectation maximization (EM) method and its evolution into variational Bayes (Eqn. (7)).

Thus, we might pace ourselves – and set our expectations of what we can accomplish – realistically.

 

Italian-renaissance-border-2-thin

 

Where to Start; What to Read

The References collected below contain some historical papers, and also the best that I could find in the way of current tutorials.

If I were starting from here, right now, not knowing any of this, the following Table would probably get me started in a useful way.

Reading List

Reading list for statistical mechanics, Bayesian probability, Kullback-Leibler, and variational Bayes (inference for machine learning).
Reading list for statistical mechanics, Bayesian probability, Kullback-Leibler, and variational Bayes (inference for machine learning).

You can find links to all items in the Table, together with some extra (usually key historical) resources, in the References list at the end.

Here’s to your success, as you become your own Master of the Universe!

All my best – AJM

 

Italian-renaissance-border-2-thin

 

References for Variational Bayes

  • Analytics Vidhya (2016), Bayesian Statistics explained to Beginners in Simple English (June 20, 2016). blogpost. (AJM’s Note: If you don’t know your Bayesian probabilities all that well, this is a fairly decent intro.)
  • Beal, M. (2003), Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London. pdf.
  • Blei, D.M., Kucukelbir, A., McAuliffe, J.D. (2016), Variational inference: a review for statisticians, arXiv:1601.00670v5 [stat.CO]. pdf.
  • Eisner, J. (2011), High-level explanation of variational inference, blogpost.
  • Feynman, R.P. (1972, 1998). Statistical Mechanics: A Set of Lectures. Reading, MA: Addison-Wesley; Amazon book listing.
  • Kurt, W. (2017), The Kullback-Leibler divergence explained (May 10, 2017). blogpost. (AJM’s Note: Very nice. I used this when teaching myself.)
  • Maren, A.J. (2017), Derivation of variational Bayes equations: Friston and Beal, Themasis Technical Note TN-2017-01. pdf
  • Maren, A.J. (2017), blogpost: How to Read Karl Friston (in the Original Greek. (Use this if there’s ever a link-break in the above; I’ll do my best to keep the link within the primary blogpost updated.)
  • Maren, A.J. (Dec., 2013) Statistical Thermodynamics: Basic Theory and Equations. THM TR2013-001(ajm).
  • Martin, C. (2016), Foundations: the partition function. blogpost.
  • Neal, R.M., & Hinton, G.E. (1998), A view of the EM algorithm that justifies incremental, sparse, and other variants, in Learning in Graphical Models (ed. M.I. Jordan) (Dordrecht: Springer Netherlands), 355-368. doi=10.1007/978-94-011-5014-9_12. pdf.
  • Tzikas, D., Likas, A., & Galatsanos, N. (2008),
    Life after the EM algorithm: the variational approximation for Bayesian inference, IEEE Signal Processing Magazine, 131 (November, 2008), doi:10.1109/MSP.2008.929620. pdf. (AJM’s Note: A particularly nice tutorial, and a good place in which to start for an overview.
  • Wainwright, M.J., & Jordan, M.I. (2008), Graphical models, exponential families, and variational inference, Foundations and Trends in Machine Learning, 1 (1-2), 1-305, doi:10.1561/2200000001.
    pdf.
  • Wikipedia: Variational Bayesian methods. (AJM’s note: I try to steer away from the Wikis as references, but this has some very nice reference-links.)
  • Some additional references:

    Information Theory – Seminal Papers

    Information Theory – Tutorials

    • Burnham, K.P., and Anderson, D.R., Kullback-Leibler information as a basis for strong inference in ecological studies, Wildlife Research (2001), 28, 111-119. pdf (Reviews concepts and methods – including K-L – in the context of practical applications to experimental data, rather than a deeply mathematical review – good for understanding basics.)
    • Strimmer, K. (2013). Statistical Thinking (Draft), Chapter 4: What Is Information?. pdf (Very nice, clear intro – v. good for understanding basics)
    • White, G. (date unknown). Information Theory and Log-Likelihood Models: A Basis for Model Selection and Inference. Course Notes for FW663, Lecture 5. pdf (Another very nice, clear intro – v. good for understanding basics)
    • Yu, B. (2008). Tutorial: Information Theory and Statistics – ICMLA (ICMLA, 2008, San Diego). pdf (Very rich and detailed 250-pg PPT deck, excellent for expanding understanding once you have the basics.)

     

    Italian-renaissance-border-2-thin

     

    Previous Related Posts

    Return to:

    9 thoughts on “Seven Statistical Mechanics / Bayesian Equations That You Need to Know

    1. The pdf links to the Strimmer and Burnham papers do not appear to be working. They sound interesting. Would you mind updating those?

    Leave a Reply

    Your email address will not be published. Required fields are marked *