What We Need to Create General Artificial Intelligence (GAI):
A brief recap: We know that we want to have neural networks (including deep learning) do something besides being sausage factories. We’ve know that the key missing step – a first principles step – to making this happen is to give the network something to do when it is not responding to inputs. Also, we’ve introduced something that the neural network CAN do; it can do free energy minimization with a (slightly-more-complex-than-usual) free energy equation. (That means, it can come to equilibrium after various perturbations.)
Also, we got our first look at how this new free energy equation – for a 2-D Cluster Variation Method (CVM) network – will behave. (We’ll have lots more examples in the future.) We saw, last week, that using the 2-D CVM, we could characterize one system as being at equilibrium, and another as not-at-equilibrium.
In short, we’ve laid a bit of a foundation.
At least, we’ve dug the appropriately-sized hole in the ground, laid out a concrete-block perimeter, and are busy pouring the foundation.
The question is: what will the building look like once it’s done?
That is, how will all of this (intellectual heavy-weight-lifting) work benefit us; what will we get from our efforts?
In this post, I’m going to sketch out a brief blueprint.
More than that, I’m going to suggest that we’re in the very initial stages of a quest – and that what we’re questing for does not (as yet) exist. Our very act of questing brings it into reality.
That’s another way of saying: we have this sense that something that we’re calling “general artificial intelligence” is the Next Big Thing. We can’t exactly define it just yet (Turing machines aside). We just know that – no matter how fabulous deep learning is right now – there’s something that is a lot more than piling it higher and deeper with network layers.
So, we’re going to sketch it out today.
But Enough About Me; Let’s Talk About You
You’ve already told me a lot about yourselves. It’s come through the introductions you’ve written (in the online classes that I teach with Northwestern University’s Master’s in Data Science program), as well as your LinkedIn overviews and posts, and your comments and emails.
In particular; you’re riding a fine line between how much dedication, commitment, and sheer willingness to hit the “overdrive” button (again and again) can get you, in the face of chronic stress and near-exhaustion.
For most of you (about 80% ++): You’ve got a full-time day job, and it involves a lot of responsibility. Often overtime. Often travel.
You’ve got a family, and sometimes family-crunch takes a bite out of you. You typically have a couple of kids, maybe a baby or a baby-on-the-way. Some of you are caring for family members who have a chronic health problem. You’ve got a bit of extra responsibility also; something to help out the kids (weekend soccer practice), plus a little something connected with your faith (whichever religious system you belong to).
And, to top it all off, you’re self-educating in neural networks / deep learning / artificial intelligence just as fast as you can. Maybe you’re taking one or two online courses. (Our parent’s generation called this “night school.” Doesn’t matter that you can now do it at all hours of the day; it’s the hard, lonely way to get to the next career rung.) If you’re not enrolled in a formal program, then you’re immersed in a mix of Coursera, Udemy, online videos, bootcamps, and whatever else you can find.
But do you know what you REALLY are, beneath all the corporate and family and social roles?
You are a Hero-Warrior.
Male or female, early in your career or at some senior level, living on your own or with a whole lotta family/social obligations, in your absolute core, you already are a Hero-Warrior. That is what propels you to take this arduous, difficult, and – in some ways, dangerous – journey into the heart and core of AI.
The bottom line is: even Hero-Warriors need to learn.
What’s Just Happened Here
What’s just happened is that you’re Luke Skywalker, and you’ve already destroyed the Death Star.
You’ve got a lightsaber, and you already know how to use the Force.
It’s that next stage of training that you’re confronting now.
You’ve downloaded TensorFlow and Keras, and played with CNNs (convolutional neural networks) and LSTM (Long-Short-Term Memory) networks?
That’s your first pass of figuring out how to use your lightsaber.
You’ve figured out how to configure a deep learning network?
That means you’ve got some experience of using the Force.
But do you really know the physics – the statistical mechanics – underlying the more advanced energy-based models?
As Yoda has said,
Patience you must have, my young Padawan.
You’re going beyond the basics; you’ve gone to Dagobah, and you’re learning the much more subtle and arcane secrets. (That would be you, right here, right now, reading this blog, and trying to figure out the new 2-D CVM free energy equation.)
And really, truly, seriously, this training is a royal pain-in-the-a**. (For gentle amusement and encouragement, reconnect with Luke training with Yoda at Dagobah.)
Because, as we all know, the Death Star is not the end of the game.
So, here we are – learning statistical mechanics, and not even everyday, ordinary stat mech, but this advanced, super-arcane form that has a whopping crazy entropy formulation.
The question is: what are we going to get from our labors?
What super-human powers will we possess if we can master this new thing?
Free Energy Minimization is the Force that Rules the Universe
Not to be too kitschy or cute with you, but there really is the Force. It’s free energy minimization.
This is what natural systems do, left to their own devices.
This is what happens in the brain, according to some of the best brain-theorists that we have.
It’s just that – so far – free energy minimization has played a very limited role in neural networks and AI. The minimization has been used exclusively to train connection weights, resulting in getting better classifications (or whatever task the neural net is designed to do).
We’re wanting to go beyond that now.
Luke Skywalker needed to go beyond the basics that Obi-Wan Kenobi taught him.
He needed a six-month immersion study with Yoda.
And frankly, six months is about what it will take us. We’ll break things down, and at the end, you will be so intimately familiar with how these equations work, it will be like they whisper to you in the darkness.
What Happens Next
We’ve introduced the basic 2-D CVM free energy equation. We’ve had a look at how it performs. Not in-depth yet, but the first glimpse.
We’ve had our first picture of what a free energy-minimized system might look like (even though strange), and another image of something that seems as though it should be at equilibrium, but is not (follow that same link).
We know that we’re going to put a 2-D CVM layer inside a neural system. (At this stage, with the system doing more than just the usual neural network connection steps, we may as well call it something else – I’ve been calling it a computational engine.)
We can do the same kinds of training that we have done in the past, although there will be a couple of additional steps.
Then, once the network is trained, we can give it a pattern (or a series of patterns), and then stop presenting the stimulus … and see what happens.
“You must unlearn what you have learned.” – Master Yoda
What I’ve just described is a new kind of computational engine; something more than a neural network (because it has the independent free energy minimization within its hidden layer). I’m calling this the CORTECON – for COntent-Retentive TEmporally-CONnected neural network.
It’s a bit more complex than the typical neural network – even a deep learning one. The complexity is due to its internal processes; the trade-offs between training certain nodes to be “on” in response to input stimulus, and then the within-layer free energy minimization, which should yield (ultimately) some stability for the larger clusters of “on” nodes.
The free energy minimization gives it a first-principles foundation. However, there WILL be lateral connections between nodes in the hidden layer. Not to assist with learning input patterns, and providing the desired outputs. Rather, so that when the stimulus from the input layer is removed, there is something to help regain known patterns – even if there are substantial perturbations.
Here’s the overall plan of attack.
So what is it that we have to unlearn, as Master Yoda says?
Three things, for the first pass:
- The habit of input/output thinking – the first step in creating GAI is that we want something that can do something besides strictly respond to input stimuli; we want some ability to ponder, reflect, ruminate … to re-organize what it has learned, to create new concept hierarchies … and to let one thought lead to another,
- The habit of thinking that physical layout of nodes doesn’t matter – with this new approach, topography matters a great deal, and we’ll discuss this substantially in coming weeks, and
- The habit of thinking that things happen in isolation – even though some neural architectures (e.g., recurrent NN, LSTM – or Long-Short-Term Memory – NN) allow for recent stimuli to influence current processing, we still fully haven’t tapped how a system can bridge both spatial and temporal experiences; this opens up new possibilities.
In the figure above, I indicate that lateral connections play a role. This is taking the architecture one step beyond a pure first-principles approach. Our first principles basis is that we’re using free energy minimization as our fundamental process. However, brains use lateral connectivity – lots of it. We’ll do the same.
We can be smart about it, just as our brains have smart strategies.
I’ll embellish that point later.
My post from two weeks ago, The Big, Bad, Scary Free Energy Equation (and New Experimental Results), gave an example of the first computational results from the 2-D CVM. We have plenty more to create and assess.
Immediate Next Steps – and What We Know Already
All of the interesting work involving the 2-D CVM will involve bringing the system back to a free energy minimum, given a huge number of possible starting conditions. This is going to be a computational process, each time. It will involve selecting a pair of nodes (one “on,” and another “off”) and flipping them – as we’ll most often want to keep our x1 & x2 ratios the same. We’ll flip, see if we’ve reduced the free energy. This will be tedious and hugely computationally expensive.
One of the things that we’ll need to develop, as soon as possible, will be heuristic strategies for moving towards free energy minimized (equilibrium) states, from different starting configurations. (Doctoral dissertations, anyone?)
The very, very FIRST step is that we need to identify exactly WHAT the target free energy (minimized) values will be; given any pre-selected and desired values for x1 and for the interaction enthalpy parameter, h.
We do not (at this moment) have this set of minimum free energy values as a function of the x1 and h-values. However, once we determine them computationally; they will be known, solid, and stable.
Right now, even without the full set of free energy minimum values, we can easily identify two kinds of conditions where we can determine the free energy minimums analytically.
- For a range of h-values (interaction enthalpy parameter values) for the specific case where x1 = x2 = 0.5 (equal numbers of nodes in the “on” and “off” states); that is, we keep x1 constant at 0.5, vary the h-value, and analytically compute the remaining configuration variables; all of the nearest-neighbors (y(i)), the next-nearest-neighbors (w(i)), and the triplets (z(i)); this is what let us know that the LHS figure in last week’s post on Figuring Out the Puzzle (in a 2-D CVM Grid) was not at equilibrium where the one on the right was; and
- For different given x1 values, while h is held constant at h = 1 (or the interaction enthalpy epsilon1 = 0), we can compute the configuration variables (e.g., the z(i) triplets) on a strict probabilistic basis.
In both of these cases, if we know the (at-equilibrium) set of configuration variables, then we can compute the thermodynamic values, and the two conditions just described each give us set of at-equilibrium configuration variables. In the first condition, we know what the configuration variable values are (for a given h-value), because I’ve solved the equations for the equiprobable case. In the second condition, we know the configuration variables because it’s very easy to (probabilistically) compute them if there is no interaction enthalpy. (The details for the analytic solution in the first condition are given in The Cluster Variation Method II: 2-D Grid of Zigzag Chains: Basic Theory, Analytic Solution and Free Energy Variable Distributions at Midpoint (x1 = x2 = 0.5); this required solving three dependent, nonlinear equations; it is horribly complex, but it works.)
These two different conditions provide a reference frame for gauging the success of the remaining computational free energy minimum values. We can judge whether the computational results make sense, based on their smoothness and consistency relative to their nearest analytic solutions.
Once we have an overarching set of at-equilibrium, computationally-based configuration and thermodynamic values as reference points, we’ll know – as we permute a system towards a free energy minimum – how close we are to an expected equilibrium state. That will tell us how much more work is needed to guide the system towards equilibrium.
This is all very near-term work. It’s filling in the form for the tasks identified in the orange and yellow-shaded boxes shown in the figure above.
Once we get this done, we can start playing with patterns; we can start building lateral connections, teaching the grid to learn certain patterns, and find out how it changes when we introduce some gentle (or even harsh) perturbations.
From there, we start getting the network to respond to input stimulus and learn and store patterns.
Then we can move on to the really interesting things … pattern association, development of concept hierarchies through “uber-patterns,” and more.
The GAI world awaits us.
Live free or die, my friend –
Live free or die: Death is not the worst of evils.
Attr. to Gen. John Stark, American Revolutionary War
P.S. – How to Get Your Hands on This Stuff
May I assure you that doing the derivations and writing the code is a huge, painstaking job?
That said, I’m sure you’d rather start with working code than to try reverse engineering things on your own.
I do plan to make my code available. I haven’t decided exactly on how, or the time-lag from getting what I think is working code to code that I absolutely know works correctly.
My general plan is this:
- First releases – Code documentation and V&V: You saw that the first V&V document was released recently on arXiv; see the arXiv abstract. Code documentation (slidedecks) will go into my public GitHub repository; there’s nothing exciting there yet, but as soon as I have the code documentation complete, you can get it on my public 2D-Cluster-Variation-Method Github repository.
- Second releases – Computational results, with accompanying theory and equations: As soon as I get a reasonably solid set of computational results, I’ll assemble them into a Technical Report and put them up on arXiv; you’ll be able to use this as guidance for what to expect from the code, and finally
- Third releases – Actual code: – Python. Very readable, structured code. Not too sure what constraints I’ll put on code release, and whether or not I’ll package it with the book that I’m writing, but sooner or later, you’ll be able to get your hands on this. No, you WILL NOT have to wait until I get the book published to get your hands on the initial code. That would take way too long. But I may want to link code with a couple of early-release book chapters that provide the necessary theory in succinct form. There WILL be a time delay between my getting results published and your getting the code, but at some point, you WILL be able to access it.
So – your heroism.
It’s simply this.
It’s one thing for me to have the basic insights, and to put together a prototype system, and even conduct initial experiments.
But if we’re really going to do GAI, this is much more than a one-person effort. It takes your heroism, with your unique insights and strengths, to move this forward.
Right now, it’s enough to be learning the fundamentals.
But even Yoda didn’t try to take on the Galactic Empire by himself. Ultimately, he concentrated on teaching.
YOU are the likely one to bring change to the established order.
YOU are the Padawan who will become a Jedi Master.
The Essential References
If you’re going to follow along, there are now three valuable papers – the older, more tutorial paper on the 2-D Cluster Variation Method, and (just published this January, 2018) the Code V&V. Also, the derivation for the z(i) values in terms of h (where the solution is at the free energy minimum, or equilibrium) is given in the middle (2014) paper; I’ll tweak it a bit and put it up on arXiv soon, you can see the pdf below for now:
- Maren, A.J. (2018) Free Energy Minimization Using the 2-D Cluster Variation Method: Initial Code Verification and Validation, THM TR2018-001(ajm), arXiv:1801.08113 [cs.NE] arXiv abstract.
- Maren, A.J. (July, 2014) The Cluster Variation Method II: 2-D Grid of Zigzag Chains: Basic Theory, Analytic Solution and Free Energy Variable Distributions at Midpoint (x1 = x2 = 0.5). THM TR2014-003(ajm). DOI: 10.13140/2.1.4112.5446
- Maren, A.J. (2016) The Cluster Variation Method: A Primer for Neuroscientists. Brain Sciences, 6(4), 44. doi:10.3390/brainsci6040044 pdf
Previous Related Posts
- Figuring Out the Puzzle (in a 2-D CVM Grid)
- 2-D Cluster Variation Method: Code Verification and Validation
- The Big, Bad, Scary Free Energy Equation (and New Experimental Results)
- A “First Principles” Approach to General AI
- A Hidden Layer Guiding Principle: What We Minimally Need
- How Getting to a Free Energy Bottom Helps Us Get to the Top
- What’s Next for AI: Beyond Deep Learning