Making Machine Learning As Simple As Possible
Albert Einstein is credited with saying, Everything should be made as simple as possible, but not simpler.
Machine learning is not simple. In fact, once you get beyond the simple “building blocks” approach of stacking things higher and deeper (sometimes made all too easy with advanced deep learning packages), you are in the midst of some complex stuff.
However, it does not need to be more complex than it has to be.
Does this equation make perfectly good sense to you?
Nope? That’s ok. I had some seriously WTF moments when I started studying this, myself.
What set me back, though, as in – Oh. My. God. – was when I was presenting the basic slide – which contained this equation – in one of my classes a few weeks ago.
One of my students – as seriously bright guy, with a strong background – asked, “what’s the double vertical lines between the q and the p stand for?”
Rocked me back a bit.
Not that it was a dumb question. Far from it.
I had simply gotten so used to staring at these things that I’d internalized a little bit.
And once a person has internalized, they get a blind spot. Then, it’s real hard to remember that someone else – someone who has never seen this before – will have some real problems just reading the equation in plain English. (Or Urdu or Chinese, language of choice, you know what I mean.)
Made me realize two things:
- Before we get started on our great journey through the mountains, we need a map; we need to know what’s there, and
- It really helps if we can at least name the mountains.
Naming the Demons
In classical mythology, if you can name the demon, you can control it.
We’re not exactly calling these equations “demons.” But our first task is to name them. That means, being able to read through the equation, left-to-right, and understand what it means.
Not saying that we know how to use the equation, just yet. Not saying that we know its derivation, or how it links with others.
Just reading through the equation, without getting hung up on a little piece of notation (like the infamous “double vertical lines”), is our first step.
The Seven Key Equations (Once Again)
You saw this diagram a couple of weeks ago, in Seven Statistical Mechanics and Bayesian Equations That You Need to Know.
Seven equations from statistical mechanics and Bayesian probability theory that you need to know, including the Kullback-Leibler divergence and variational Bayes.
These are the equations that you’ll learn about in the Cribsheet (Précis).
A Cribsheet for the Seven Essential Machine Learning Equations
In response to that very simple, straightforward question from my student (“what do the double lines mean?”), I’ve created a Cribsheet.
OK, that was the original plan; a 2-3 page quick little “legend” for the Seven Key Equations “map.”
The two-to-three page Cribsheet turned into a 24-page Précis for the book-in-progress, Statistical Mechanics for Neural Networks and Machine Learning.
What it has:
- The Seven Essential Machine Learning Equations,
- How to read these equations, in plain English, and
- Figures and examples.
All this does is help you READ THE EQUATION. That’s all. No derivations.
But after reading through this, your comfort, familiarity and overall confidence will increase.
You’ll be able to name the demons.
You can get the Précis – plus a bonus slidedeck on microstates by doing (yes, I know) another Opt-In process.
Opt-In to get your Precis and Bonus Slidedeck – Right HERE:
Bonus Slidedeck – Microstates – Detailed Examples
You may recall that I confessed to a serious egg-on-my-face moment last week, in a postscript to A Tale of Two Probabilities. While I was putting together the Précis – if fact, on final edits – I realized that I had something disastrously wrong.
I had forgotten what the summation index j meant in the partition function equation. (It’s always the little things that get you, right?)
It’s not summing over all the units in the system, nor is it summing over energy levels.
It’s summing over microstates.
Several days later (almost a week), after having read a whole lotta stuff, and done a whole lotta backpedaling and rewriting, the Précis is now done.
In the process, I also produced a Microstates Slidedeck. Two solid, detailed, carefully-worked examples.
In the near future, we’ll have a lot on microstates. They’re the key to unlocking the partition function, which gives us free energy, which gives us energy minimization methods – which are becoming a dominant theme within machine learning.
And if you don’t get it about microstates, then your whole foundation is weak and shaky.
But now, you’ll have the Microstates Slidedeck along with the Précis – and it’s really an easy read. You’ll have fun with it. And you’ll feel better about the equations. I promise.
Live free or die, my friend –
Live free or die: Death is not the worst of evils.
Attr. to Gen. John Stark, American Revolutionary War
Previous Related Posts
- A Tale of Two Probabilities
- Seven Statistical Mechanics and Bayesian Equations That You Need to Know
- Approximate Bayesian Inference
- The Single Most Important Equation for Brain-Computer Information Interfaces (Kullback-Leibler)