Expressing Total Microstates (Omega) for the 1-D Cluster Variation Method – Part 1

Expressing Total Microstates (Omega) for the 1-D Cluster Variation Method – Part 1

The 1-D CVM – A Single Zigzag Chain – Part 1:

 

The cluster variation method (CVM) lets us characterize a system in terms of local patterns, and not just the numbers of units in on/off states. This is likely to be useful for machine learning and AI applications.

Up until now, we’ve not told the story of how we actually compute the CVM entropy from the microstates. We’ll do that starting with this post; it will be a handy reference. (Also, if I’m asked this question in a seminar, it’s so much easier to invite someone to go to this and the subsequent posts, rather than to start deriving and explaining complex things on the fly.) This post and the one immediately after will focus on the 1-D CVM, and a subsequent ones on the 2-D case.

 
Italian-renaissance-border-2-thin

 

Memory Jogging – Entropy and Microstate Basics

 

Before we jump into the (inherently) more complex microstates and entropy expression for the cluster variation method, let’s get back to basics with a simple on/off system. For such a system, the entropy is given as

\bar{S} =  - \sum\limits_{i=1}^I Lf(x_i),

where I is the total number of possible energy states in the system. For a simple bistate system (e.g., where units or “nodes” can be either on or off), I = 2. Then, x1 refers to the fraction of units in the on (A) state, and x2 refers to the fraction in the off (B) state. Also, because we’re dealing with just two states, x1 + x2 = 1.0.

The term Lf(x) is a general expression that captures the terms for each different state, and we’ll use it more extensively as we start working with more variables in the CVM.

Lf(v) = v \ln (v) - v.

For the simple bistate system, where we’re only thinking (so far) about the units in either the on or off states, then v = {x1, x2}, meaning that the units that we’d use to replace v in the equation above are the set of x1 and x2 units, respectively.

(For the most relevant background post, see Wrapping Our Heads around Entropy.)

The entropy equation that we introduced as our first equation in this post is actually not the foundational equation. The foundational equation, for which Ludwig van Boltzmann is justly famous, is given as

S =  k_\beta \ln \Omega,

where k_\beta is Boltzmann’s constant, and Omega (\Omega ) is the total number of available microstates, which is given as

\Omega = \frac{N!}{(N_{x_1})! (N_{x_2})!}

where N is the total number of units in the system, and N(x1) (or N_{x_1}  }) and N(x2) (or N_{x_2}  }) refer to the total number of units in states 1 and 2 (or A and B, or on and off) respectively.

Of course,

N_{x_1} + N_{x_1} = N.

 

Italian-renaissance-border-2-thin
 

A Quick Word on the Plus / Minus Signs for Entropy

 

This section is for those of you whose recollections of statistical mechanics are a bit hazy – or for those who’ve never studied the subject before. You saw a negative sign in front of the entropy term for the first equation. There’s no negative sign in the second version of the entropy term. If you write a an equation for the free energy of a system (enthalpy minus temperature*entropy), there is that negative sign in front of the entropy term … so what’s going on?

First things first. The entropy is always positive. If we graph entropy (for a simple bistate system) as a function of one of the two variables (say x1; the fraction of terms in the “active” or on state), we see a concave bowl-shaped curve, as shown in the following Figure 1. (And just to make things simple – or more complex – in this figure we use x for x1, and 1-x for x2, since we have the constraint that x1 + x2 = 1.0.)

The entropy for a bistate system is at a maximum when x1 = x2 = 0.5.
Figure 1. The entropy for a bistate system is at a maximum when x1 = x2 = 0.5.

As I mentioned in the previous section, the foundational entropy equation is the one involving Omega. It doesn’t have a negative sign in front. Omega is created from a series of multiplications of whole, positive numbers; it is always positive.

When we do a suitable amount of mathematics, and divide through by some constants, then we get what we call a “reduced expression” for entropy; the entropy term with a bar over the top. We like this form because it is dimensionless (the k-beta term carries dimensions that are useful to physicists and chemists, but not useful to those of us in AI or machine learning). Also, this “reduced expression” is independent of the actual size of the system;

\bar{S} =  S / {k_\beta N},

where N is the total number of units in the system.

The actual entropy (and free energy and enthalpy terms, for a different discussion, a different day) are all extensive. That means that they depend on the size of the system. If we add more units to the system, then we increase the entropy, etc.

For our purposes, we want something cleaner – so we divide through by the total number of units and work just with fractions of things.

We get the negative sign that we saw in the first equation because, when we do some mathematical churning, we go from the Omega-equation to the ln(x) equation. (Meaning, we go to the Lf(x) equation, which involves ln(x).) Recall that x (meaning x1 and x2 here) must be less than zero; we’re talking about the fractions of units in one state or another. If we take the logarithm of a fraction, we get a negative number. Therefore, we need a negative sign in front of the whole equation to ensure that the end result is a positive number. This negative sign doesn’t appear by magic, it comes about naturally when we do the math to go from the Omega-form to the Lf(x)-form.

In real life, and in any good model, a system will tend to maximize its entropy. (This is often referred to as the “Second Law of Thermodynamics,” and has been the subject of physicists attempting to be humorous.)

The reason that we’re not all just a pile of entropically-driven sludge is that in real life, entropy is not the only factor. The real driving process is that systems tend towards a free energy minimum, and not just an entropy maximum. Free energy is a combination of two things; enthalpy (or the energy associated with having units in different states, and with units interacting with each other) and entropy. In a free energy equation, we put a negative sign in front of the entropy term, meaning that we’re flipping the entropy-bowl upside down, and looking for the minimum. And of course, the location of this minimum interacts with the enthalpy, so the free energy minimum is not always the negative-entropy minimum. Subject for a different discussion, a different day.

 

Italian-renaissance-border-2-thin
 

If You Need a Little Microstates Refresher

 

If, by chance, you are not already familiar with microstates – this is the notion that underlies all of statistical mechanics. The computations that we’ve been looking at here all deal with microstates, and Omega represents the total number of microstates available for a a given distribution of units into different energy states. (In the CVM world, it represents a given distribution of available nodes into specific configuration variables.)

If you’re needing a little refresher on the whole microstates thing – Opt-In using the form below. Do the usual confirm, etc., once you’ve opted in. You will then receive an eight-day tutorial sequence on how to compute microstates. It has links to some slidedecks, and lots and LOTS of figures. Sort of a visual approach to learning microstates, built up from lots of examples. It’s the easiest and most pain-free introduction that you’ll ever find.

Opt-In to get your Precis and Bonus Slidedeck – Right HERE:






Opt-in to access Seven Essential Machine Learning Equations

We respect your email privacy

 
Italian-renaissance-border-2-thin
 

Microstates for the 1-D CVM

 

As a quick refresher, the 1-D CVM is constructed as a single zigzag chain. Here’s an illustration of the simplest 1-D CVM configuration expressed with two base pattern repeats.

A 1-D CVM grid with two instances of a base pattern.
Figure 2: A 1-D CVM grid with two instances of a base pattern.

(For more explanatory figures on the 1-D CVM, see Transition to Object-Oriented Python for the Cluster Variation Method. For a slidedeck including this and other figures, see the slidedeck referenced in the GitHub repository at the end of this post, entitled “1D-CVM _Object-oriented-code_V-and-V_Two-base-patterns_2018-10-10-18.pptx”.)

Figure 3 shows the full set of configuration variables. This figure is modeled on the original figure (Table II) shown in Kikuchi & Brush (1967). The original Kikuchi & Brush (1967) paper is available from AIP (American Institute of Physics), at a cost of $30.00. (See link at the end of this post.) This post (and subsequent ones on the same theme) attempts to capture the relevant points of that paper, saving the readers the annoying process of finding their credit card.

The configuration variables and their degeneracies (the number of times they can be counted different ways) for the Cluster Variation Method (CVM) using an angle as the basic computational unit. Figure based on Table II of Kikuchi & Brush (1967).
Figure 3: The configuration variables and their degeneracies (the number of times they can be counted different ways) for the Cluster Variation Method (CVM) using an angle as the basic computational unit. Figure based on Table II of Kikuchi & Brush (1967).

The following discussion replicates arguments originally advanced in Kikuchi & Brush (1967), advancing earlier work done by Kikuchi (1951). Extra commentaries and examples are introduced, to make this a more tutorial approach.

The Omega (total number of available microstates) for a single zigzag chain is based on viewing the chain as a construct of two rows. An illustration is given in Figure 4, which is based on Figure 2 of Kikuchi & Brush (1967).

A horizontal single row (a) and a double row (b), creating a single zigzag chain, as initially illustrated in Fig. 2 of Kikuchi and Brush.
Figure 4: A horizontal single row (a) and a double row (b), creating a single zigzag chain, as initially illustrated in Fig. 2 of Kikuchi and Brush. (The original Kikuchi & Brush figure showed a truncated form of the second (bottom) row; the rows should be of equal length.)

We can express the number of ways in which the single zigzag chain can be constructed as

\Omega_{double} = \frac { \big\{ pair \big\}_{2N_2} } {\big\{ angle \big\}_{2N_2}},

where 2N_2 refers to the total number of units in the system, as
N_2 is the total number of units in a row, and the rows are of equal length. The notation pair refers to the total number of distinct different pairs that can be counted in a given configuration, and the notation angle refers to the total number of different kinds of triplets. This equation is identical with Eq. I.9 of Kikuchi & Brush (1967), which also references the approach taken by Kikuchi (1951).

We define pair and angle as

\big\{ pair \big\}_M = \prod_{i=1}^{3} {( M y_i)!}^{\beta_i}

and

\big\{ angle \big\}_M = \prod_{i=1}^{6} {( M z_i)!}^{\gamma_i}.

This pair of equations is identical with Eqn. I.8 in Kikuchi and Brush (1967), and draws from notation in Kikuchi (1951).

 
Italian-renaissance-border-2-thin
 

An Important Note about Nomenclature

 

If you’re attempting to read either or both of the Kikuchi and/or Kikuchi-and-Brush papers in the original, you’ll run into the inevitable nomenclature snags.

Here’s some key points, and the next post will work through these in more detail:

  • M means the number of units in a single row, as used by Kikuchi (1951), and is identical with N2 as used by Kikuchi and Brush (1967).
  • L means the number of rows, as used by Kikuchi (1951), and is identical with N1 as used by Kikuchi and Brush (1967).
  • N means the total numbers of units in both papers. (Thank God!)
  • The y(i) units in the original Kikuchi (1951) paper refer to nearest-neighbors on the same row, which really are redefined to be the next-nearest-neighbors when we start thinking about a 2-D grid composed of rows that are slightly offset with regard to each other in Kikuchi and Brush (1967). That means that the w(i) in the Kikuchi and Brush paper mean the same thing as the y(i) in the original Kikuchi paper.
  • A final maddening note: In Kikuchi and Brush (1967), they refer to the w(i) connections (the next-nearest-neighbor ones) as being the “diagonal” connections. If we look at the grid, we’d naturally think of the “diagonal” connections as being the y(i) ones. The reason (most likely) that they use this term is that in Kikuchi and Brush, they’re transitioning from thinking about a square lattice (where the nodes are perpendicularly arranged with regard to each other) to a new formulation where they are offset, as shown in the figures in this post. This means that what was a “diagonal” connection in the Kikuchi and Brush paper is now in the same row as they re-interpret it in the same paper. That causes them to persist in calling these same-row connections as being “diagonal.” This is not a problem so long as you stay with current presentations (e.g., mine) of this approach. It just becomes problematic if you go back to the source documents and don’t exercise great care.

Now, let’s get back to our equations …

 
Italian-renaissance-border-2-thin
 

Back to the Expressions for Pairs and Angles

 

The use of M is in reference to Kikuchi’s first use of M to mean the total number of units in a row; N = M*L, where M was the length of each row, and L was the total number of rows. In Kikuchi and Brush, the total number of nodes in the system is also defined as N = N1 * N2, where N1 was the number of rows, and N2 was the number of nodes in each row.

Previously, we (that is, I, in editorial mode) gave the Omega equation for a single zigzag chain as

\Omega_{double} = \frac { \big\{ pair \big\}_{2N_2} } {\big\{ angle \big\}_{2N_2}}.

Notice that the subscripts for pair (in the denominator) and angle (in the numerator) are 2*N2. This is because there are a total of 2*N2 nodes in the system; we have two rows of length N2 each.

We can transform this equation using Stirling’s approximation. (I will devote another blogpost at some later date to this; if you’re not already familiar with it, please accept this on faith for now.)

The result is

\Omega_{double} = \Big[ \frac { \big\{ pair \big\}_{N_2} } { \big\{ angle \big\}_{N_2} }  \Big]^2.

 
Italian-renaissance-border-2-thin

 

Rationale for Constructing Omega for the 1-D CVM

 

At this point, it would be logical and appropriate to explain exactly how the Omega equation is obtained.

That would be lovely, and is more work than I can do this week.

There is a detailed explanation in the original Kikuchi (1951) paper, and a skimming reference to that in the Kikuchi and Brush (1967) paper. I will attempt to interpret these explanations in the next post.

Until then, we’ll take this equation on faith and proceed with a couple of examples.

 
Italian-renaissance-border-2-thin

 

Example 1: The Omega Value for a Simple 1-D CVM Grid

 

In the previous equations, the pair and angle values refer to actual counts of the numbers of pair (nearest-neighbors) and angle (triplets) in the grid, which in our case is a single zigzag chain. More specifically, they refer to the total numbers of pairs and triplets, since we are using 2*N2 as the subscript (initially, until we reduce the equation using Stirling’s formula), and the correspondence is with M in the previous equations.

If we were to follow the equation that is post-application of Stirling’s approximation, we’d be counting the types of pairs and angles in just one row, and then squaring these results.

However, that would just not make sense for a simple example, because (1) there are different kinds of pairs and angles that originate in each row, and (2) both pairs and angles require two rows each; they both cross the rows. So we can’t simply count the pairs and angles from Row 1 and then square those factorials.

Instead, we’ll use the predecessor equation, and count the total numbers of different kinds of pairs and angles in the entire system. To do this, we’ll use the equations given earlier in this post; those that give the values for pairs and angles in terms of the configuration variables y(i) and z(i), respectively.

In these equations, the y(i) and z(i) values refer to the fractional values of different nearest-neighbor and triplets in the system. The actual counts of the nearest-neighbors and triplets is Y(i) = N*y(i) and Z(i) = N*z(i), where N is the total number of units. As a simple example, referring to Fig. 1, the total number of nodes N = 16.

Because the grid is designed to give equal distributions across the configuration variables, we have Y(AA) = Y(AB) = Y(BA) = Y(BB) = 4; there are equal numbers of each nearest-neighbor type. For the fractional values, we get y(1) = y(AA), y(2) = y(AB) = y(BA), and y(3) = y(BB). Thus we have y(1) = y(2) = y(3) = 0.250, and of course y(1) +2*y(2) + y(3) = 1.0. Because there are two different ways of constructing a y(2) nearest-neighbor, the degeneracy factor beta for y(2) is 2; elsewhere it is 1.

Naturally, we have N*y(1) = 16*0.250 = 4, and similar results for the other y nearest-neighbors.

So, for this specific example, where N = 16, we have

\big\{ pair \big\}_{16} = (4!){(4!)}^2 (4!)

We can similarly identify the terms for the angle;

\big\{ angle \big\}_M = \prod_{i=1}^{6} {( M z_i)!}^{\gamma_i}.

(Yes, I know that I’m shifting between M and N notations; it’s a mix-up with the notations in the original papers and I’ll probably have to come back and rewrite this section.)

If we refer back to Figure 2, we see that the fraction of units in each of the triplet configurations is 0.125.

We have the constraint that the sum of the triplet fractions equals one, or

\sum_{i=1}^{6} {\gamma_i} z_i= 1,

where gamma(2) = gamma(5) = 2, as there are two ways to count the triplets z(2) and z(5), as shown in Figure 2. The degeneracy parameter gamma = 1 for all other triplets.

For the 1-D system, the total number of triplets equals the total number of nearest-neighbors equals the total number of next-nearest-neighbors equals the total number of units in the system. That is,

\sum_{i=1}^{6} {\gamma_i} Z_i =  \sum_{i=1}^{3} {\beta_i} Y_i =  \sum_{i=1}^{3} {\beta_i} W_i  =  \sum_{i=1}^{2} X_i = M,

where

Z_i = M z_i,

etc.

Thus, just as we had 16 Y(i) nearest-neighbor connections for this (wrapped-around) 16-node 1-D CVM grid, we also have 16 Z(i) triplets. The previous blogpost, Transition to Object-Oriented Python for the Cluster Variation Method, had a good illustration of the Z(i) triplets for a 1-D CVM, illustrating the same base pattern as used here.

With 16 triplets, the allocation in this pattern (a “double-base”) gives two triplets to each of the Z(i) configurations, recalling that there is double degeneracy on z(2) and z(5), as those patterns can each be counted two ways. (That is, z(2) can be counted as both z(AAB) and z(BAA), and z(5) can be counted as both z(ABB) and z(BBA).)

Thus, for this particular example, we have z(i) = 0.125, and Z(i) = 16*0.125 = 2, so that

\big\{ angle \big\}_{16} = (2!){(2!)}^2 (2!)(2!){(2!)}^2 (2!).

 
Italian-renaissance-border-2-thin

 

Putting Together the Omega for the 1-D CVM with Two Instances of the Base Pattern

 

In this example, what we obtain for Omega is

\Omega_{double} = \frac { (4!){(4!)}^2 (4!) } {(2!){(2!)}^2 (2!)(2!){(2!)}^2 (2!)},

or
\Omega_{double} = \frac { (3){(4!)}^2 (3) } {{(2!)}^2 {(2!)}^2 },

or
\Omega_{double} = \frac { (9)(4!)(4!) } {(2!)(2!) (2!)(2!) },

or
\Omega_{double} =  (9)(3*2)(3*2) = 324.

 
Italian-renaissance-border-2-thin

 

What Happens When We Move Away from Equilibrium

 

Just for fun, let’s interchange two nodes and see what happens to our entropy. We’re moving away from the equilibrium position just a bit; that where we have maximal distribution over all possible states. In this particular example, just by swapping two nodes (and still keeping the total numbers of A and B nodes the same), we decrease the Z(1) and Z(6) triplets, and increase the other triplets.

1D CVM grid with two nodes interchanged giving one base pattern and one altered pattern, shifting the distribution away from the equilibrium configuration.
Figure 5: 1D CVM grid with two nodes interchanged giving one base pattern and one altered pattern, shifting the distribution away from the equilibrium configuration.

In this new example with two interchanged nodes, what we obtain for Omega is

\Omega_{double} = \frac { (3!){(5!)}^2 (3!) } {(1!){(3!)}^2 (2!)(2!){(3!)}^2 (1!)},

or

\Omega_{double} = \frac { (5!) (5!) } {(3!)(2!)(2!)(3!) },

or

\Omega_{double} = \frac { (5*4) (5*4) } {2*2 } = 100.

This value is considerably less than the Omega = 324 value that we obtained for the previous example, in which the nodes were arranged so there was a maximal distribution across all the possible states for nearest-neighbors and triplets.

This example is trivial, yet it serves to illustrate our point. We have maximal entropy (also maximal Omega) when we maximize the distribution across all possible states. When we distort this, creating more of one kind of state (triplet and/or nearest-neighbor) than others, we lower the entropy.

We’ll continue this saga in a subsequent post, when we look at configuring the Omega for a 2-D CVM grid.

 
Italian-renaissance-border-2-thin

 

Live free or die, my friend –

AJ Maren

Live free or die: Death is not the worst of evils.
Attr. to Gen. John Stark, American Revolutionary War

 
Italian-renaissance-border-2-thin
 

Key References

 

Cluster Variation Method – Essential Papers

  • Kikuchi, R. (1951). A theory of cooperative phenomena. Phys. Rev. 81, 988-1003, pdf, accessed 2018/09/17.
  • Kikuchi, R., & Brush, S.G. (1967), “Improvement of the Cluster‐Variation Method,” J. Chem. Phys. 47, 195; online as: online – for purchase through American Inst. Physics. Costs $30.00 for non-members.
  • Maren, A.J. (2016). The Cluster Variation Method: A Primer for Neuroscientists. Brain Sci. 6(4), 44, https://doi.org/10.3390/brainsci6040044; online access, pdf; accessed 2018/09/19.
  • Pelizzola, A. (2005), “Cluster variation method in statistical physics and probabilistic graphical models,” J. Physics A: Mathematical & General, 38 (33) R308; online as: Pellizola CVM PDF; available for purchase through IOP Science, see also Pellizola-abstract in Cluster Variation Methods page.
  • Yedidia, J.S.; Freeman, W.T.; Weiss, Y., “Understanding Belief Propagation and Its Generalizations”, Exploring Artificial Intelligence in the New Millennium, ISBN 1558608117, Chap. 8, pp. 239-236, January 2003 (Science & Technology Books). Also available online as Mitsubishi Electronic Research Laboratories Technical Report MERL TR-2001-22, online as: http://www.merl.com/reports/docs/TR2001-22.pdf

 

The 1D CVM – Code, Documentation, and V&V Documents (Including Slidedecks)

  • GitHub for 1-D Object-Oriented Python Snippets
    • Code: 1D-CVM_OO_basic-config-vars_V-and-V_1-1_2018-09-23.py (Python 3.6; self-contained)
    • Documentation / V&V: 1D-CVM _Object-oriented-code_V-and-V_Two-base-patterns_2018-10-10-18.pptx

 

Previous Related Posts

 
Italian-renaissance-border-2-thin
 

2 thoughts on “Expressing Total Microstates (Omega) for the 1-D Cluster Variation Method – Part 1

Leave a Reply

Your email address will not be published. Required fields are marked *