The Yin and Yang of Learning Deep Learning

The Yin and Yang of Learning Deep Learning

Sometimes Leaning Into It Is Not Enough:

You folks tend to be hyper-focused, hugely on-your-game types. (My TA, and one of my favorite people, described himself as “alpha-squared.” So true for a lot of you!)

So given your alpha-ness (or your alpha-squared-ness), your dominant approach to mastering a new topic is to work like crazy. Read a whole lot of stuff, from original papers down to tech blogs and forums. You install code environments, teach yourselves all the latest and greatest, and develop and run and tinker with code. You watch technical YouTubes until you fall asleep at night.

All of this is well and good.

The necessary (and often overlooked) counterpoint to all of this?

You need some yin to balance out the yang.

You need a bit of time where you’re not leaning into it. Instead, you’re leaning back a bit, being a bit more (shall we be risky here?) passive, rather than active.

The reason that we need to do this is that artificial intelligence and machine learning are evolving from the confluence of different worlds. Two of the worlds most dominant now are statistical mechanics (a realm of theoretical physics) and Bayesian probability theory. These combine with neural network studies in a very unusual (and mind-boggling) manner.

Machine learning incorporates statistical mechanics, Bayesian probability theory and information theory, and neural networks.
Machine learning incorporates statistical mechanics, Bayesian probability theory and information theory, and neural networks.

If we’re going to have anything more than superficial understanding, then we need to get into each of these contributing worlds – at least a little bit – and then integrate our understanding.

That integration effort requires being in a different mental space.

 
Italian-renaissance-border-2-thin

The Zen of Deep Learning

 

My mother, who was not only Catholic, but a very educated Catholic, once described the distinction (in the Western religious experience) between meditation and contemplation. In the Catholic tradition, meditation was a more active mental experience (more yang), and contemplation more passive (more yin).

In our studies, we often do meditation-like experiences. That is, we can mull over and ponder. We run Gedanken-experiments, or thought-experiments, where we set up equations or networks and run them in our heads, before we actually put the algorithm to the data.

Equally important, we need to make time for contemplation.

The closest analogy that I can come up with is that it’s like holding a Zen koan.

What is the sound of one hand clapping?

Zen Koan

In Zen practice, a master gives a student a koan, which is a question for which there is no logical answer. An example of a koan might be: “What is the sound of one hand clapping?” One of the key aspects of koans is that they embody the identity of opposites.

This notion of identity of opposites underlies a log of the major advances in artificial intelligence and machine learning.

 
Italian-renaissance-border-2-thin

Holding the Equation

 

In our world, certain equations are like Zen koans. They contain an improbable meeting of opposites.

The subtle-yet-scary aspect of these equations is that we can work through the math, and think that we understand – because we’ve worked the math. We’ve implemented the code.

But if we haven’t brought the math and the code down to an intuitive, gut-level, felt-realization, then we’re still missing the essence.

We’ll have all the cognitive knowledge in the world. (It’s like reading all the sutras and teachings.) We’ll have run gazillion code variants. (It’s like spending hundreds of hours in meditation.)

But if we haven’t achieved that a-ha! moment, then we’re still floating on the surface, and not penetrating into the heart of the matter.

 
Italian-renaissance-border-2-thin

How I Do It

 

My personal approach is to go for walks. Julia Cameron, author of The Artist’s Way, talks about the need for a daily Artist’s Walk. (She’s still advocating walks; see this short and more recent blogpost from Julia.)

There are some other things that work as well.

Driving, in an open country environment, with little need to stop and attend, can work also. So can housecleaning, especially with a repetitive chore.

Basically, anything that puts my body into a state of rhythmic flow, and lets my conscious mind disengage for a bit, will work.

Some people use noodling or doodling, or journaling. (I do journaling, a lot, but rarely about sci-tech matters. My preferred mode is to walk, and to “hold the equation,” or have a mental discussion with myself, on a subject – and then come back and pour it into a blog post, book chapter, or content page for my students. To each their own.)

The thing that I recommend the most is to find a way to factor this into your life practice. Create a time zone that you connect with this holding the equation experience, and link it to other activities, so that the predecessor and current activities cue the holding the equation inner state. James Clear, author of Atomic Habits, has some good ideas about how to do this.

One of the things that we need to realize – and cultivate – is some form of a sustainable, long-term practice. Something that we do, maybe not on a daily basis, but at least weekly. (Monthly is too infrequent. Hard to pick up from where we left off the last time.)

Bottom line: the AI world continues to advance at a breakneck speed. Even if we “catch up,” for a moment, that moment won’t last long. And the major advances in AI each come from fusing together two (or more) disparate disciplines.

Here’s a figure that I published back in November, 2017, in Third Stage Boost: Statistical Mechanics and Neuromorphic Computing.

Brief history of AI in log-time scale
Brief history of AI in log-time scale

This figure shows how the timescale for major AI advances is contracting logarithmically (reverse-exponentially); the whole “symbolic” era of AI took about 32 years, and the next era of simple neural networks (dominated by stochastic gradient descent, typically using backpropagation) took about 16 years.

Well, that figure was published in 2017, and it’s now 2019. So, if the time-compression scale is holding true, then we have at least two major “waves” of advances that have occurred in the past two years.

The 2018 wave was very likely reinforcement learning, which has had a major comeback. It’s dominated the 2018 landscape, based on using a reinforcement learning-based system to win over the world’s leading Go master in the oriental game of Go.

Even more recently (as in, early 2019), people who keep an eye on the tech horizons suggest that Karl Friston’s work using free energy as a basis for adaptive variational Bayes might be the next big thing. (See my post on Karl from about two years ago; the article that I link to there is almost updated, and I’ll update the link as soon as the article is completed; it’s a Rosetta stone for understanding Karl’s equations.)

Every time a new discipline comes into the mix, we need to back up. (It’s that long runway thing again.)

This means that we’ll always be studying, we’ll always have to be learning a fundamentally and radically new (to each of us) discipline. And – most challenging yet – we’ll each have to integrate what we’re learning.

That’s why we each need our yin time, and we’re each going to need this on a regular and ongoing basis. Because this whole learning-and-integration process really is going to go on for the rest of our lives.

 
Italian-renaissance-border-2-thin
 

Some of My Favorite A-Ha! Moments

 

It’s by taking this yin time that I’ve been able to bring to light what I regard as some of the most mind-boggling conundrums of the AI and machine learning world. Here’s my top three faves, culled from the last year or two of writing:

  • What We Really Need to Know about Entropy – useful because the notion of entropy pops up so often out of context from free energy, where it is an integral part of the equation – and the other aspect (enthalpy) pops up as the “energy” in a lot of writing about energy-based neural network systems (without mention of entropy), and I really wondered about this sort of disassociative-dissonance. It took some long walks and several a-ha’s to get this one,
  • Neg Log Sum Exponent Neg Energy – That’s the Easy Part! – more on this mysterious missing entropy term, and
  • A Tale of Two Probabilities – the totally alchemical, mystical union of statistical mechanics-based and Bayesian probability-based thinking in the same machine learning universe.

What are your favorite a-ha! moments? Share in the Comments section below, and I’ll respond!

 
Italian-renaissance-border-2-thin

 

Live free or die, my friend –

AJ Maren

Live free or die: Death is not the worst of evils.
Attr. to Gen. John Stark, American Revolutionary War

 
Italian-renaissance-border-2-thin
 

Previous Relevant Posts

Previous posts highlighting koan-like moments in deep learning:

Other Posts Mentioned Here:

 
Italian-renaissance-border-2-thin
 

6 thoughts on “The Yin and Yang of Learning Deep Learning

  1. This is a fantastic article Dr. A.J. The “yin and yang of learning” as you put it can be applied to any discipline, domain, or business. Self-acknowledgement of “the learning-integration process” can reduce stress and increase productivity with the realization that this is a natural evolution of events, rather than being overcome by events!

  2. Hello Prof. AJ – this is Randy your student from Montana. Thank you for posting about this topic. I will admit that I often rush through reading the article or getting the code to work, and then once that has cleared out of my short term memory, I have to learn it again, not having understood it at a “gut-level”. I have occasionally found some insight while trying to meditate (although wrongly – I let my mind wander and start working on understanding an issue – instead of focusing on not having thoughts). I am wondering if you could describe more in depth what you do during your walks, and dive more deeply into what you mean by “holding the equation”? Does “holding the equation” mean holding that equation or topic in your mind in a sustained manner and then working through it? Or does “holding the equation” mean putting it aside, and not focusing on that for awhile but instead something else? I am assuming it is the former.

    1. Hi and thank you, Randy – great to hear from you!
      Honestly, I don’t have a simple, in-my-hip-pocket answer right now. I read your comment a few days ago (just now getting to it), and am going to have to mull on this for a bit.
      Probably will be the subject of a near-term blogpost!
      Thanks again! – AJM

  3. Randy – I’m looking forward to Dr. A.J.’s answer to your question! But I’d also like to offer a suggested reading on the implementation and expectations of meditation for work which could be applied to academics – the book “10% Happier” by Dan Harris, an ABC news correspondent. A great read walking you through his skeptical quest around the world for the secret to effective & efficient meditation for ambitious people in a modern world to reduce stress and increase the thought process. It’s entertaining and helpful.

    1. Makes sense, and thank you, Jessica!
      Randy – I need to go and re-create some of the optimal versions of the state that I’ve described, and take mental notes, and transcribe them. That’s the hold-up in my response to you. Otherwise, I’m answering without really connecting with the process.
      Your question is on my docket for a full-length blog response; near future, thanks again! – AJM

Leave a Reply

Your email address will not be published. Required fields are marked *