The Big, Bad, Scary Free Energy Equation (and New Experimental Results)

# The 2-D Cluster Variation Method Free Energy Equation – in All Its Scary Glory:

You know, my dear, that we’ve been leading up to this moment for a while now.

I’ve hinted. I’ve teased and been coy.

But now, it’s time to be full frontal.

We’re going to look at a new form of a free energy equation; a cluster variation method (CVM) equation. It deals not only with how many units are in state A or state B, but the overall nature of local patterns. This means, how many B nodes are next to A nodes, and other configurations, area all important.

So, without further ado, here it is:

$\bar{G}_{2D} = G_{2D}/NkT = \varepsilon_0 x_1 + \varepsilon_1(z_4 + z_3 - z_1 - z_6) - \bar{S}_{2D} + LG_1 + LG_2$

where

$\bar{S}_{2D} = 2 \sum\limits_{i=1}^3 \beta_i Lf(y_i) + \sum\limits_{i=1}^3 \beta_i Lf(w_i) - \sum\limits_{i=1}^2 Lf(x_i) - 2 \sum\limits_{i=1}^6 \gamma_i Lf(z_i),$

$LG_1 = \mu (1-\sum\limits_{i=1}^6 \gamma_i ) ,$

and

$LG_2 = 4 \lambda (z_3+z_5-z_2-z_4),$

and where

$Lf(v) = v \ln (v) - v.$

The first thing to note about this equation is that the key terms $\bar{G}_{2D}$ and $\bar{S}_{2D}$ have that crucial “bar” over the G and the S; that means that they are reduced terms.

This reduction involves incorporating three distinct variables:

1. N, which is the total number of units in the system; by dividing through by N, we can express all our counts of terms in states A and B in terms of fractions, rather than explicit counts,
2. k, which is Boltzmann’s constant, and which is terribly important to physicists and physical chemists, but doesn’t have much meaning if we’re doing neural networks and machine learning, and
3. T, which is temperature, and which is also very important if we were modeling a physical system; by dividing through by T we implicitly incorporate it into our enthalpy-coefficients (along with N and k); this makes the enthalpy coefficients be dimensionless; and we also remove it from its usual position of multiplying the entropy S.

(Note: the free energy term and the enthalpy terms are divided through by NkT; the entropy is only divided through by N; the kT is normally included as a multiplier for the entropy in the free energy equation, so it disappears when we divide through the entire equation by NkT.)

Now, going into the equation itself, the first two terms on the right in the leading equation are two enthalpy terms. One has the coefficient $\varepsilon_0$ multiplying x1, where x1 is the total fraction of nodes in state A (the “on” state).

The second enthalpy term is

$\varepsilon_1(2 y_2 - y_1 - y_3) = \varepsilon_1(z_4 + z_3 - z_1 - z_6).$

This second enthalpy term addresses the energy (enthalpy) introduced into the system via pairwise interactions of units. We’ll go into how the various y’s and z’s correspond with each other shortly; for the moment, if suffices to say that y(2) ($y_2$) corresponds to both AB and BA interactions, that y(1) corresponds to AA interactions, and y(3) corresponds to BB interactions. We can express the y’s in terms of the z’s. We’ll expand on how we form the enthalpy term, what it means, and how we express the y’s in terms of the z’s in a subsequent post. (It will take us a while to get through all of this.)

The last two terms, $LG_1$ and $LG_2$, are Lagrange factors and are introduced in order to help solve for a particular solution; they are not part of the normal free energy equation expression itself. As we will see (when we get to doing derivatives), we can simplify these a great deal. For now, we’ll simply carry them forward.

It is the entropy term that strikes us as being most unusual; this is the one that reads

$\bar{S}_{2D} = 2 \sum\limits_{i=1}^3 \beta_i Lf(y_i) + \sum\limits_{i=1}^3 \beta_i Lf(w_i) - \sum\limits_{i=1}^2 Lf(x_i) - 2 \sum\limits_{i=1}^6 \gamma_i Lf(z_i).$

Those of us who have played with entropy before, especially for two-state systems, are used to an entropy that looks like

$\bar{S} = x \ln x + (1-x) \ln (1-x).$

As an aside, we’ll note that this more familiar version would have started as

$\bar{S} = ( x \ln x - x ) + ((1-x) \ln (1-x) - (1-x) ) ,$

or

$\bar{S} = x \ln x + (1-x) \ln (1-x) - 1.$

That last term of 1 is a constant, we can factor it out of our overall computations with no loss.

So, returning now to the bigger entropy equation, we see that it is formally similar; just much more complex. What it is doing is summing up the entropy contributions from three additional sources:

1. y(i) – the nearest-neighbor contributions,
2. w(i) – the next-nearest-neighbor contributions, and
3. z(i) – triplets.

Together, all the x(i), y(i), w(i), and z(i) are referred to as the configuration variables. Once again, we’ll defer a detailed look at exactly what these mean for just a little bit.

Instead of a detailed description, we’ll jump immediately to recent experimental results of a computer program that probabilistically calculates the equilibrium configuration and thermodynamic variables when we specify a given value of x1 (fraction of units in state A). This is shown in the following set of graphs, for the case where x1 = 0.35:

Experimental results for configuration variable y2, the delta of configuration variables used for the enthalpy, and the enthalpy, entropy, and resulting free energy, computed for h-values ranging from 0.8 to 1.8, while x1 = 0.35.

The important thing about these results are that they are achieved by running a probabilistic simulation of a 256 (16 x 16) node 2D CVM system coming to equilibrium for different h-values, with the constraint that x1 is approximately 0.35 for each run.

As a V&V step (verification and validation), the y2 value when h = 0.0 is y2 = 0.2278 (for x1 = 0.3498, when the target was x1 = 0.35). This is the case where there is no interaction energy between nearest neighbors. (That’s what is implied when h = 1.0; the h-value is a function of the interaction enthalpy, and when the interaction enthalpy is zero, the h-value is one.) In this case, we would expect (for x1 = 0.3498) that y2 = 0.35*0.65 = 0.2275, which is very close to the experimentally-achieved result of y2 = 0.2278. We achieve our expected result for y2 by multiplying the fraction of nodes in state A by the fraction of nodes in state B, to obtain the total fraction of A-B nearest neighbor pairs. (When x1 = 0.35, x2 = 0.65, and x1*x2 = 0.2275.)

This is the first release of experimental results for computing the equilibrium (free energy minimum) results for a 2D Cluster Variation Method (CVM) system when we are not imposing the constraint that x1 = x2 = 0.5. When we impose that equiprobability (of x1 = x2) constraint, there is an analytic solution. However, that is far too limiting; we want to characterize how the system behaves for cases where x1 is NOT equal to x2.

This work is a necessary precursor towards using the 2D CVM in a system model, as Friston has suggested in numerous articles, and to using it as a hidden layer in a computational engine; the next step beyond a restricted Boltzmann machine.

There’s a lot here to chew on; both theoretically and experimentally.

I’ll close for now; we’ll pick up with these theme next week.

As a final note: I plan to release the full code, along with some slidedecks providing code documentation and summarizing the V&V steps, as well as summaries of experimental results. I’ll make the code available on GitHub after I’ve done an initial write-up.

Thanks for joining in.

This is a new step forward, and the discussion can now be much more interesting, as we’ll have actual graphs of the system performance.

Live free or die, my friend –

AJ Maren

Live free or die: Death is not the worst of evils.
Attr. to Gen. John Stark, American Revolutionary War

## Some Useful Background Reading on Statistical Mechanics

In previous posts, I’ve referenced a classic text along with some good online tutorials about statistical mechanics; together with my introductory tutorial. This week, I’m suggesting that the best place to read about the 2D CVM is the paper that I published in late 2016. I’ve changed the enthalpy expression in this version; it influences the results – of course – but the idea and theme is the same.

There is an analytic solution for the enthalpy equation that I’m using here as well (there’s also an analytic solution published in the paper); these are both constrained to the equiprobable case – that’s why we’ve needed the computer program to generate probabilistic results.

That program, and its supporting documentation, are NOT yet available – I will release them, and I’ll note that in a forthcoming blog as well as coming back and updating a link here.

• Maren, A.J. (2016) The Cluster Variation Method: A Primer for Neuroscientists. Brain Sciences, 6(4), 44. doi:10.3390/brainsci6040044 pdf