How Getting to a Free Energy Bottom Helps Us Get to the Top

How Getting to a Free Energy Bottom Helps Us Get to the Top

Free Energy Minimization Gives an AI Engine Something Useful to Do:

 

Cutting to the chase: we need free energy minimization in a computational engine, or AI system, because it gives the system something to do besides being a sausage-making machine, as I described in yesterday’s blog on What’s Next for AI.

We're not yet near the intelligence of a four-year-old child, says John Giannandrea, head of Google's machine learning division.
We’re not yet near the intelligence of a four-year-old child, says John Giannandrea, head of Google’s machine learning division. Image from IB Times UK article.

Right now, deep learning systems are constrained to be simple input/output devices. We force-feed them with stimulus at one end, and they poop out (excuse me, “pop out”) results at the other.

This doesn’t mean that there aren’t all sorts of complexities within the layers. It just means that if we’re not putting something in, then we’re not getting something out.

This, my friend, is not general AI. Furthermore, it doesn’t have a snowball’s chance in hell of becoming the foundational platform for AI.

What we know now as deep learning is simply not going to get us there.

 
Italian-renaissance-border-2-thin
 

Ruminating: An Essential Component of Intelligence

 

In contrast to the input/output sausage-machine operation of deep learning systems, our minds do much more.

Specifically, our minds have the ability to do all sorts of things when stimulus is turned off, or when it’s turned down to a dull roar. Why else would we want to just go to the beach and stare at the waves coming in? Why do we enjoy a long hike or drive through nature? Why do we take our morning coffee out to the back porch and just sit and let our minds float for a while?

We humans like ruminate and reflect. We like to ponder.

Pondering gives us a chance to meander around in our mental space and make connections; to make associations that we haven’t made before.

We can’t do this when we’re hyper-stimulated or stressed. We need these complementary modes; we need some yin to go with our yang.

This is also part of why sleep – and dream-state – is so essential to our mental health.

If we (and all forms of natural intelligence) seem to need sleep, dreams, and quiet time, it makes sense that we’d expect the same from any possible form of general artificial intelligence, or GAI.

 
Italian-renaissance-border-2-thin
 

What a System Would Absolutely Have to Have to Be a General AI

 

In order to have the basis for general AI, we need to give the system an opportunity to reflect and ponder; to do something when it is NOT actively engaged in some task.

At the same time, letting a system just wander around inside its own space – without any kind of process – is just nuts.

Not all possible spaces are created equal.

We typically spend more time in the valleys than on the steep slopes or summits. Wikipedia image
We typically spend more time in the valleys than on the steep slopes or summits. Wikipedia image.

Suppose that you are going to take a walk in nature. Most likely, you’ll stay more to the low areas, right? Yes, you may occasionally hike to a summit. But if you were to spend a week or so hiking, and then did a grid-map of where you spent most of your time, you would most likely be more in the valley than on the upper slopes, right?

Water flows downhill. Humans spend more time in valleys and than on steep slopes and summits. And neural networks follow gradient descent algorithms.

So if a system is not slavishly being an input/output machine, and is having some time to ruminate about inside its own space, it makes sense that it would follow some sort of process, doesn’t it?

It further makes sense that this process would guide the system to spend most of its time (not all, but most) among the foothills or in the valleys.

The natural process that guides simple systems – systems made up of units that can be in on/off states – is free energy minimization.

Thus, it makes sense that if an AI system can decouple itself from constantly processing an input/output stream, and engage in some sort of process, that the fundamental core of this process would be free energy minimization.

This doesn’t mean that it would stay in one minimum state, all the time. It would receive low-level stimulus (perhaps consistently, even if it is not receiving a strong, “must-process-me-now” kind of stimulus). This stimulus would get it up and out of its current (free energy-minimized) state, percolate about a bit, and then resettle – into either its original or a different, but still free energy-minimized, state.

The notion of free energy minimization gives a system something to do, even when it’s not particularly doing anything.

So what we’re going to do – in this post and in future ones – is play with the notion that a foundation for general artificial intelligence (which I’ll henceforth just abbreviate as GAI) – is a system that undergoes free energy minimization even when it is not actively learning or responding to stimulus.

This means that the free energy minimization – which is already part-and-parcel of weight training for (restricted) Boltzmann machines – is going to do more. It’s no longer going to be a just a means for getting the connection weights right. Instead, it’s going to be something that happens within the hidden layer(s), even when just a low level of stimulus is present. (I’m not saying no stimulus; I’m saying that in the absence of strong external inputs, it will experience some gentle perturbation or random noise inputs. Just to keep it from being locked in stasis.)

Now fortunately, I’m not a lone voice crying in the wilderness. The notion of free energy minimization as part of GAI has been advanced by at least one other research, for the last couple decades.

 
Italian-renaissance-border-2-thin
 

Who Else Is Doing It

 

Let’s be real here, ok?

None of us wants to devote a whole lot of time and energy to chasing down a vague, mythical what-if. We simply don’t have that much wiggle room in our lives.

Particularly, if I’m talking to you about statistical physics (and you know that’s going to mean a lot of equations – I’m soft-peddling right now, but you know that something’s coming up), you’ve got to have a pretty good reason to go there, right?

So, one researcher who’s been proposing this line of thinking is Karl Friston, who’s written numerous works on this topic. (See a list of his articles in one of my previous blogposts, How to Read Karl Friston (in the Original Greek). Overview and links, and a draft of my paper on interpreting Friston’s core equations.)

Here’s the essential notion that Friston is advancing.

Karl Friston's notion of a modeling system.
Karl Friston’s notion of a modeling system.

In Friston’s theology, there is a system (operating in an internal world) that is separated from the external world by a Markov blanket. The internal world (like a brain, creating its own thoughts, or an AI system) is denoted r, where r stands for “representational states.” (I like to shorthand that to “internal reality.”) The external world is denoted psi, which is the Greek letter that looks something like a pitchfork. The Markov blanket separating the internal world (the brain’s or AI’s mental constructs) from the external world is a boundary layer which contains stimulus (from the external world into the system) and actions (by the system out onto the external world). The system (or internal world) has actual probabilistic states p(s,a,r), where s is the stimulus component of the Markov blanket, a is the action component, and r is the internal reality (representational states) or system-world.

The external world takes on probabilistic states denoted p(psi). The internal world has its own probabilities p(s,a,r), which it attempts to achieve in order to match its own probabilities to its actual model of the external world, denoted as q(r, theta), where theta represents the set of parameters that can be adjusted by its model in order to better match the model to the external world. (This is even though the model is q(r, theta) – that means that the model doesn’t really know what the external world is; it can only surmise by receiving stimulus inputs. The model has to work within the internal reality, which is why it is q(r, theta), and not q(psi, theta).)

This all seems a bit strange, at first, but makes more sense — if you can immerse yourself deeply into the world of Friston-equations for a while. (Which is, indeed, an altered-states experience. Trust me.)

Now, I’m not going to go too deeply into a Friston-esque world right now; it is very arcane, subtle, and abstract – and if you’ll follow along with me in the future, we’ll have plenty of arcane, subtle, and abstract stuff to deal with, without adding more.

It’s just good to know that someone else (scarily brilliant, albeit hard to follow) has already blazed a first-pass mental trail that we can follow.

So where does this leave us?

 
Italian-renaissance-border-2-thin
 

The Foundation: A Free Energy-Minimizing Hidden Layer

 

Here’s a general notion of something that is Friston-like, and also deep learning-like, in nature.

Notional form of a computational (AI) engine that would allow for independent free energy minimization in the hidden layer.
Notional form of a computational (AI) engine that would allow for independent free energy minimization in the hidden layer.

The above figure illustrates a foundational mechanism for a general AI system. Note that this is not – by any means – all of the logic, the internal connections and details, the multiple processes – that would need to be involved.

Instead, this figure simply shows a fairly typical neural network-like system, with input, hidden, and output layers.

The difference is that the hidden layer is much more complex than usual.

I’m illustrating this with a 2-D hidden layer. The input and output layers can be 1-D (usual vector form), 2-D (e.g., image pixels), or whatever is needed – that’s not the point right now. Also, we do not have to use a 2-D hidden layer; I’ll make some arguments for this in subsequent posts, but that’s a side-issue right now.

What is important is that we have a hidden layer that can – independent of input stimulus – come to a free energy minimum.

This means that the hidden layer must do double-duty:

  1. Perform its traditional role of creating a feature-level representation of the inputs, and have (learned) connection weights that connect (as usual) inputs to hidden layer nodes, and hidden layer nodes to output nodes, and also
  2. When NOT doing its usual role of guiding classification (or whatever the input-hidden-output task may be), perambulate around a bit (following some low-level stimulus AND/OR memory of previous states), under the general overall guidance of free energy minimization.

Just the prospect of this computational engine doing something in its free time – something that isn’t the old-fashioned input-hidden-output process – gives us something new.

It’s way premature to assess what this “something new” can be; it’s just that now, we have some foundational mechanism on which to base this new process.

 
Italian-renaissance-border-2-thin
 

In Order to Do This, We Really Need a …

 

The above idea and rationale may seem plausible, and even attractive. But to take the next step, we need a free energy equation that can usefully be minimized.

The classic Ising equation just won’t do it. It’s just too limiting.

(For a historical reference, the Hopfield neural network – which WAS a brilliant innovation, back in 1982 – really was a single-layer network that did free energy minimization to learn patterns, and it had some pretty horrific memory-storage problems. That’s why people quickly moved to multi-layer neural networks, separating out input, hidden (to learn the features), and output layers. However, the Hopfield network DID INDEED using the classic Ising model.)

So … if a simple Ising-model is just … too simple, what are we going to do?

The answer is: a slightly more complex Ising model. We’re going to keep the enthalpy term more-or-less, but work with a more complex entropy.

There are all sorts of reasons for this, which will be the subject of the next post – which will (hopefully) be on schedule, Thursday of next week.

Until then, have a lovely between Christmas-and-New-Year vacation!

 
Italian-renaissance-border-2-thin

 

Live free or die, my friend –

AJ Maren

Live free or die: Death is not the worst of evils.
Attr. to Gen. John Stark, American Revolutionary War

 
Italian-renaissance-border-2-thin
 

Some Useful Background Reading on Statistical Mechanics

 

  • Hermann, C. Statistical Physics – Including Applications to Condensed Matter, in Course Materials for Chemistry 480B – Physical Chemistry (New York: Springer Science+Business Media), 2005. pdf. Very well-written, however, for someone who is NOT a physicist or physical chemist, the approach may be too obscure.
  • Maren, A.J. Statistical Thermodynamics: Basic Theory and Equations, THM TR2013-001(ajm) (Dec., 2013) Statistical Thermodynamics: Basic Theory and Equations.
  • Salzman, R. Notes on Statistical Thermodynamics – Partition Functions, in Course Materials for Chemistry 480B – Physical Chemistry, 2004. Statistical Mechanics (chapter). Online book chapter. This is one of the best online resources for statistical mechanics; I’ve found it to be very useful and lucid.
  • Tong, D. Chapter 1: Fundamentals of Statistical Mechanics, in Lectures on Statistical Physics (University of Cambridge Part II Mathematical Tripos), Preprint (2011). pdf.

 
Italian-renaissance-border-2-thin
 

Previous Related Posts

 
Italian-renaissance-border-2-thin
 

Return to:

Leave a Reply

Your email address will not be published. Required fields are marked *