# The Single Most Important Equation for Brain-Computer Information Interfaces

# The Kullback-Leibler Divergence Equation for Brain-Computer *Information* Interfaces

The Kullback-Leibler equation is arguably the best place for starting our thoughts about information theory as applied to Brain-Computer Interfaces (BCIs), or Brain-Computer Information Interfaces (BCIIs).

The Kullback-Leibler equation is given as:

We seek to express how well our *model* of reality matches the *real system*. Or, just as usefully, we seek to express the information-difference when we have two different models for the same underlying real phenomena or data.

The K-L information is a measure, or a *heuristic distance*, between an approximating model *g* compared with *f*, which for our purposes can be either another model of the same data, or the actual data distribution itself.

Because the K-L measure is not symmetric, that is, it is not appropriate to call it a *distance*. Instead, we refer to this quantity as the *K-L divergence*.

### Kullback-Leibler Divergence Notation

- , the K-L information (or divergence) is the “information” lost when model
*g*is used to approximate*f*, -
*f*itself can be either the real data distribution, or a different model of the data – so that we are either comparing a model against data, or a model against another model, -
*f*and*g*are n-dimensional probability distributions over the n-dimensional domain*x*, and - The range of parameters underlying observed and modeled states is denoted .

### Kullback-Leibler Divergence: Continuous and Discrete Formalisms

We can use the K-L divergence in dealing with either continuous data (or continuous models), or in the discrete case. For this, we’ll drop the notation for the parameter set , noting that either *f* or *g* can be functions of parameters as well as the data space.

#### Continuous

#### Discrete

Here, the sum is taken over the dataset *{d(i)}*.

### Goal in Using the Kullback-Leibler Divergence

The goal is to minimize the information lost, or the divergence.

We note that if *f(x) = g(x)*, then *ln(f(x)/g(x)) = ln(1) = 0*, and thus *I(f,g) = 0*, which means that no information is lost when the “real” situation is used to model itself.

However, this “real” situation described by may be very complex; we are looking for an approximating model . Here, *g* is not only a function of the (n-dimensional) space *x* which is the domain of *f*, but also of a set of parameters , which allows us to tune the approximating model.

This description of the Kullback-Leibler formalism is taken from Strimmer (2013), Burnham & Anderson (2001), and White (date unknown); see references below.Note:

## All Models Are Wrong

George E.P. Box, a renowned statistician (1919-2013), is famously credited with having said:

All models are wrong, but some are more useful than others.

Perhaps more germane to our point – we are now getting some success with a range of BCIIs. Now that we’ve proved that we can do this, our focus is now on questions such as how do we:

- Achieve consistent and accurate performance?
- Expand the range of capabilities?
- (Perhaps most important) Measure our results?

The realm of *Brain-Computer Information Interfaces (BCIIs)* addresses these issues.

*To achieve our goals in creating effective Brain-Computer Information Interfaces, we need statistics.*

As Bell said,

We have a large reservoir of engineers (and scientists) with a vast background of engineering know how. They need to learn statistical methods that can tap into the knowledge. Statistics used as a catalyst to engineering creation will, I believe, always result in the fastest and most economical progress… (Statement of 1992, quoted in Introduction to Statistical Experimental Design — What is it? Why and Where is it Useful? (2002) Johan Trygg & Svante Wold)

## The Next Post on the Kullback-Leibler Divergence

This series of blog posts on the K-L divergence will continue. I will insert links to this one from future posts, and put in links from here to the future posts as they evolve.

The direction that this will take – after a bit more of a theoretic overview – will focus on the application of the K-L divergence to Brain-Computer Information Interfaces (BCIIs):

- How is the K-L divergence being used in current BCI/BCII work today?
- How can we envision it being applied to future BCI/BCII developments?, and (perhaps most importantly)
- What are the limitations with this method (and with the family of related methods), and what else can we use?

## References

### Information Theory – Seminal Papers

- On Information and Sufficiency, Kullback S, Leibler RA: On Information and Sufficiency.
*Ann. Math. Statist.***22**(1) (1951), 79-86. On Information and Sufficiency – full PDF file. - A Mathematical Theory of Communication, Shannon CE: A Mathematical Theory of Communication.
*The Bell System Technical Journal.***XXVII**(3) (July, 1948), 379-423.

### Information Theory – Tutorials

- Burnham, K.P., and Anderson, D.R., Kullback-Leibler information as a basis for strong inference in ecological studies,
*Wildlife Research*(2001),**28**, 111-119. pdf (Reviews concepts and methods – including K-L – in the context of practical applications to experimental data, rather than a deeply mathematical review – good for understanding basics.) - Strimmer, K. (2013).
*Statistical Thinking (Draft)*, Chapter 4: What Is Information?. pdf (Very nice, clear intro – v. good for understanding basics) - White, G. (date unknown).
*Information Theory and Log-Likelihood Models: A Basis for Model Selection and Inference.*Course Notes for FW663, Lecture 5. pdf (Another very nice, clear intro – v. good for understanding basics) - Yu, B. (2008).
*Tutorial: Information Theory and Statistics – ICMLA*(ICMLA, 2008, San Diego). pdf (Very rich and detailed 250-pg PPT deck, excellent for expanding understanding once you have the basics.)

## 6 thoughts on “The Single Most Important Equation for Brain-Computer Information Interfaces”