The Single Most Important Equation for Brain-Computer Information Interfaces

The Single Most Important Equation for Brain-Computer Information Interfaces

The Kullback-Leibler Divergence Equation for Brain-Computer Information Interfaces

Brain-Computer Information Interfaces: A single key equation
Brain-Computer Information Interfaces: A single key equation

The Kullback-Leibler equation is arguably the best place for starting our thoughts about information theory as applied to Brain-Computer Interfaces (BCIs), or Brain-Computer Information Interfaces (BCIIs).

The Kullback-Leibler equation is given as:

Basic Kullback-Leibler - differential_crppd

We seek to express how well our model of reality matches the real system. Or, just as usefully, we seek to express the information-difference when we have two different models for the same underlying real phenomena or data.

The K-L information I-of-f-and-g is a measure, or a heuristic distance, between an approximating model g compared with f, which for our purposes can be either another model of the same data, or the actual data distribution itself.

Because the K-L measure is not symmetric, that is, K-L_not-symmetricit is not appropriate to call it a distance. Instead, we refer to this quantity as the K-L divergence.

Kullback-Leibler Divergence Notation

  • I-of-f-and-g, the K-L information (or divergence) is the “information” lost when model g is used to approximate f,
  • f itself can be either the real data distribution, or a different model of the data – so that we are either comparing a model against data, or a model against another model,
  • f and g are n-dimensional probability distributions over the n-dimensional domain x, and
  • The range of parameters underlying observed and modeled states is denoted theta.

Kullback-Leibler Divergence: Continuous and Discrete Formalisms

We can use the K-L divergence in dealing with either continuous data (or continuous models), or in the discrete case. For this, we’ll drop the notation for the parameter set theta, noting that either f or g can be functions of parameters as well as the data space.





Here, the sum is taken over the dataset {d(i)}.

Goal in Using the Kullback-Leibler Divergence

The goal is to minimize the information lost, or the divergence.

We note that if f(x) = g(x), then ln(f(x)/g(x)) = ln(1) = 0, and thus I(f,g) = 0, which means that no information is lost when the “real” situation is used to model itself.

However, this “real” situation described by Basic_K-L_f-of-x may be very complex; we are looking for an approximating model Basic_K-L_g-of-x-and-theta. Here, g is not only a function of the (n-dimensional) space x which is the domain of f, but also of a set of parameters theta, which allows us to tune the approximating model.

Note: This description of the Kullback-Leibler formalism is taken from Strimmer (2013), Burnham & Anderson (2001), and White (date unknown); see references below.


All Models Are Wrong

George E.P. Box, 2011.
George E.P. Box, 2011.

George E.P. Box, a renowned statistician (1919-2013), is famously credited with having said:

All models are wrong, but some are more useful than others.

Perhaps more germane to our point – we are now getting some success with a range of BCIIs. Now that we’ve proved that we can do this, our focus is now on questions such as how do we:

  • Achieve consistent and accurate performance?
  • Expand the range of capabilities?
  • (Perhaps most important) Measure our results?

The realm of Brain-Computer Information Interfaces (BCIIs) addresses these issues.

To achieve our goals in creating effective Brain-Computer Information Interfaces, we need statistics.

As Bell said,

We have a large reservoir of engineers (and scientists) with a vast background of engineering know how. They need to learn statistical methods that can tap into the knowledge. Statistics used as a catalyst to engineering creation will, I believe, always result in the fastest and most economical progress… (Statement of 1992, quoted in Introduction to Statistical Experimental Design — What is it? Why and Where is it Useful? (2002) Johan Trygg & Svante Wold)


The Next Post on the Kullback-Leibler Divergence

This series of blog posts on the K-L divergence will continue. I will insert links to this one from future posts, and put in links from here to the future posts as they evolve.

The direction that this will take – after a bit more of a theoretic overview – will focus on the application of the K-L divergence to Brain-Computer Information Interfaces (BCIIs):

  1. How is the K-L divergence being used in current BCI/BCII work today?
  2. How can we envision it being applied to future BCI/BCII developments?, and (perhaps most importantly)
  3. What are the limitations with this method (and with the family of related methods), and what else can we use?



Information Theory – Seminal Papers

Information Theory – Tutorials

  • Burnham, K.P., and Anderson, D.R., Kullback-Leibler information as a basis for strong inference in ecological studies, Wildlife Research (2001), 28, 111-119. pdf (Reviews concepts and methods – including K-L – in the context of practical applications to experimental data, rather than a deeply mathematical review – good for understanding basics.)
  • Strimmer, K. (2013). Statistical Thinking (Draft), Chapter 4: What Is Information?. pdf (Very nice, clear intro – v. good for understanding basics)
  • White, G. (date unknown). Information Theory and Log-Likelihood Models: A Basis for Model Selection and Inference. Course Notes for FW663, Lecture 5. pdf (Another very nice, clear intro – v. good for understanding basics)
  • Yu, B. (2008). Tutorial: Information Theory and Statistics – ICMLA (ICMLA, 2008, San Diego). pdf (Very rich and detailed 250-pg PPT deck, excellent for expanding understanding once you have the basics.)

6 thoughts on “The Single Most Important Equation for Brain-Computer Information Interfaces

Leave a Reply

Your email address will not be published. Required fields are marked *