Book Chapter: Draft Chapter 7 – The Boltzmann Machine

Book Chapter: Draft Chapter 7 – The Boltzmann Machine

Chapter 7: Energy-Based Neural Networks

This is the full chapter draft from the book-in-progress, Statistical Mechanics, Neural Networks, and Artificial Intelligence.

This chapter draft covers not only the Hopfield neural network (released as an excerpt last week), but also the Boltzmann machine, in both general and restricted forms. It deals with that form-equals-function connection, based on the energy equation. (However, we postpone the full-fledged learning method to a later chapter.)

Get the pdf using the pdf link in the citation below:

  • Maren, A.J. (In progress). Chapter 7: Introduction to Energy-Based Neural Networks: The Hopfield Network and the (Restricted) Boltzmann Machine (DRAFT), Statistical Mechanics, Neural Networks, and Artificial Intelligencepdf
The Boltzmann machine is a natural outgrowth of the Hopfield neural network. The restricted Boltzmann machine (not shown here) is the Boltzmann machine with intra-layer connections removed.
The Boltzmann machine is a natural outgrowth of the Hopfield neural network. The restricted Boltzmann machine (not shown here) is the Boltzmann machine with intra-layer connections removed.

Italian-renaissance-border-2-thin

Your Inputs, Please and Thank You!

This particular subject, the Hopfield neural network, is the lead-in for the more interesting restricted Boltzmann machine (RBM), which is the foundation for deep learning.

Please give me your early feedback by commenting:

  • Overall approach: Is this approach – starting with quick extracts from the original papers, and building the connection between equations and network structure, useful for you?
  • Figures & diagrams: Are these working for you? Too much, too little? Want other types?
  • Equations: How’s the explanation for the equations? Too much, too little, just right?

Anything else that you need to see?

Your help right now will be so helpful and very much appreciated!

Right now is the stage when I can most easily incorporate your early feedback, and make sure that this book is something that will be most useful to you. Thank you! – AJM

Italian-renaissance-border-2-thin

Live free or die, my friend –

AJ Maren

Live free or die: Death is not the worst of evils.
Attr. to Gen. John Stark, American Revolutionary War

Italian-renaissance-border-2-thin

Resources

Link to The Book Page

  • Book: Statistical Mechanics, Neural Networks, and Artificial Intelligence

Italian-renaissance-border-2-thin

3 thoughts on “Book Chapter: Draft Chapter 7 – The Boltzmann Machine

  1. Dear Dr. A.J,
    let me briefly introduce myself. I just finished my Bachelor in CS at ETH Zurich with a strong focus on ML and am currently doing an ML Internship. Let me first tell you that I really like your website in general. I had one course at Uni (Statistical Learning Theory) that was quite advanced and used a lot of energy based methods. At that time I really, really liked this field but had a hard time putting it into context, as to how it compares to classical ML methods and to Deep learning. Your website provided great posts and recommendations to get that context. Obviously I am eagerly awaiting your book, so without further do let’s get to my comments on it.

    The Positive:
    1. I really like the writing style which is very simple and easy to follow. You are not trying to confuse us with too technical lingo and complicated formulations just to show that you know what you are talking about(which you obviously are 😉 ). It makes the reader feel like we got this, like its not something that requires a bunch of pre-knowledge (which sometimes happens when you read a paper).

    2. I also like the fact that you are very structured on how you introduce topics and go from one topic to the other. You show a clear path of what we are going to do and why we are going to do it. Additionally I absolutely love the fact that you compare different methods to each other. This is usually exactly the part that is not really done in paper’s. There you have to figure out yourself how one paper relates to the other and how concepts are related. And this is usually the crucial part that shows if you understood something or not. I like the fact that you provide guidance in this area.

    3. I think it’s a great idea to introduce chapters by using snippets from real, original papers. Firstly because this makes the whole book very close to research and not like a “blogpost” which talks about ideas on a high level but does not go into the math. Secondly it gives the reader a motivation boost, because as a reader you know that you are actually looking at the roots of a topic and you will be able to understand them.

    The stuff that one could “optimize”:
    For the following points I will assume the following: The idea I got from your website, blog posts and book description is that you want to write something for people who really know not a lot about Statistical Mechanics, energy based methods etc. You do assume some pre-knowledge about general Deep Learning methods etc. but not really about the stuff that you explain in your book. If this assumption is not the case, feel free to ignore some of my feedback. So given this assumption here a few remarks.

    1. You introduce the Hopfield Network in a very nice way and also explain it very nicely. You say that the training part will be covered in another chapter as it is not really relevant to what you try to explain in this chapter…I agree… BUT. As I was reading this, I was really wondering…well what does it mean to minimize an energy function for a Hopfield network. How can I possibly define my Goal of what I want to achieve…a loss function? or what else? That means how do I somehow insert my data into this network and the find out how good it performed. And then I remembered that I once had a lesson about Hopfield networks and how they are used to “store” information. That means because I had some previous knowledge I knew what the goal of “training” is, otherwise I had no idea. You have to think that most people will be familiar with Deep Neural Networks, and there you give the network an input, which is then multiplied with the weights etc. But a Hopfield network is totally different, here the actual neurons ARE THE INPUT. That means you superimpose the input onto the neurons. For somebody who has never heard of this idea, this would totally be not obvious. So I agree that going into the Hebbian learning rule would be too much for this chapter. But to give some intuitive feeling of how such a network is trained on a high level is important. You could for example also illustrate in with the example of “storing” a training sample and recovering a noisy sample.

    2. The second point actually follows up with the first one. In 7.5.3 you try to explain what clamping means. I have to be honest, I did not understand the concept from your explanation. The reason is because here you suddenly assume that the reader knows what “superimposing” an input onto neurons means…but you did not explain it before for Hopfield networks so now it makes even less sense. The point I’m trying to make is that for somebody who has never seen these kind of networks, the way they are trained is “completely” different to NN’s with Backprop and thus should be at least partly explained.

    3. The Gedanken Experiment you described was very interesting. I don’t know maybe it was just me, but I did not fully get the context of why we are doing this Gedanken Experiment. Is it just to get a feel for how RBM’s work? Or was it to show that sparse-inputs induce sparse activations? Maybe a little bit more context of why we are doing it would be helpful. But as said, maybe that was just me.

    4. The comparison of the BM and RBM lacked one important part that I had to google. On wikipedia one reads “As their name implies, RBMs are a variant of Boltzmann machines, with the restriction that their neurons must form a bipartite graph”. So the hidden neurons don0t have connections between them. This is also apparent in your images but you did not mention it. To me that’ is kind of important. You introduced the RBM by comparing it to MLP, but not by really saying what is different to a normal BM. In case I did not read correctly let me know…then its my mistake.

    5. Lastly I would wish for one last thing in such a book. As mentioned before you are doing a tremendous job of comparing different things to each other, what makes them different etc. and really putting them into context. But one thing to me that is very important and interesting is, how do these things actually affect applications. For example, the RBM has hidden and visible units and the Hopfield network only visible…got it. What would now interest me is what advantages or disadvantages do you get with this? What can you do with RBM’s that you can’t with hopfield networks…say you use both of them as “autoencoders”…which one would be better and why? Or what is better for a certain task an MLP or an RBM? This would give the reader also much more intuition into what is actually used and why.

    So hope that this feedback was helpful..if you have questions feel free to contact me. Keep up the good work and am excited for when the book comes out.

  2. Hi, Jeremy, and a great big THANK YOU! I am so appreciative of your comments!
    Regarding your points:
    (1) Yes, that’s good … introducing a brief overview of how the Hopfield (and general & restricted) Boltzmann machines are trained would be good in Chapt. 7, just to give a little context and completeness, with full development in later chapters – I’ll shoot for that but may have to come back to this after I’ve drafted the longer explanation in a later chapter first. (Good idea, though!)
    (2) Thanks for the word on “clamping.” I’ll elaborate in my next pass on Chapt. 7. I think this is a term that mystifies a lot of people trying to read the original / source literature in Boltzmann machines. Hinton created / introduced this term, but unless someone goes back to his very early works, he tends to use it without explanation … so yes, I’ll fill that in.
    (3) Yup. I’ll work on that. What I think we really need is an actual example, with accompanying code.
    (4) You’re correct on that RBM and what the structure is, and yes, I WILL clarify … and I’ve been mulling over this whole implication of “directed” and “undirected” graphs, which will be the subject of my next blog post.
    (5) What you’re suggesting as an experiment / illustration would be useful … what I’d really want to do would be to have very comparable code for the MLP & the RBM (and maybe, also, some examples comparing RBM & Hopfield).
    Then an explanation of what could be used where would be more helpful.
    So summing up: I introduced the Hopfield so that we can understand the RBM. I’ve got a very strong gut feeling that the proportion of people trying to work in advanced AI / DL / ML methods these days don’t really “get it” about what an RBM really is, and how it works – and this limits their ability to use it.
    I’m going to stay with my plan for putting up PDFs of the draft chapters as I evolve them. I’m also going to add code as well.
    Allow me, please, to go off on a tangent:
    When I was a chemistry student (way back when), chemistry classes had two components: lecture and lab. They were very different. A well-designed lab had a laboratory book, or at least a good set of notes.
    The AI / DL / ML books out right now are in either of these two camps; they’re either good theory books (e.g., “Deep Learning” by Goodfellow et al.), or they’re good lab manuals (e.g., anything by Jason Brownlee, and I’d throw “Deep Learning with Python” by Chollet into that category as well).
    Both kinds of books are good, and both are important. It’s just really hard to cover both kinds of topics in one book, so we see the books right now swinging to one side or the other.
    So yes, if I can put together a reasonable code set, I’ll also have to write the “lab manual” that goes with it. So, lots of work ahead …
    But thank you. Very much, thank you!
    I’ll take what you’ve suggested under advisement as I work revisions.
    And will figure out a way to help people work through the “classifier” vs. “autoencoder” differences in a way that makes sense.
    Much appreciated! – AJM

Leave a Reply

Your email address will not be published. Required fields are marked *