Why has Google open sourced TensorFlow?

I was sitting in a sun-warmed pizza restaurant in London last week talking about deep learning libraries. Everyone had their favourites. I was betting on TensorFlow, the new kid in town released by Google in late 2015. In response, a Torch fan pointed out that Google may invest in building up TensorFlow internally, but there’s no reason for them to invest in the shared, external version.

This got me thinking – why has Google open sourced TensorFlow?

Naively, I usually assume that companies keep their most crown jewels proprietary while open sourcing the periphery. In other words, keep your secret sauce close to your chest – but share the stuff that’s more generic, since it builds brand and goodwill, others may contribute helpfully, and you’re not straightforwardly giving a leg-up to your direct competitors.

Google’s approach to open source has been a little more strategic than this. Look at a handful of their major open source projects – Android, Chromium, Angular, Go, Dart, V8, Wave, WebM. The motivations behind them are various:

  • Android, Angular, Chromium, V8, Wave, WebM – creating a new version of an existing technology (free, better engineered, or faster) to disrupt an incumbent, or increase usage and thus drive revenue for Google’s core businesses.
  • Go, Dart and the long tail of minor projects are peripheral to their goals and serve less direct strategic interest.

For TensorFlow to make sense and be worthy of long-term support from Google, it needs to fall in the former category.

It is indeed a new version of an existing technology – it’s free, it’s better engineered, though not yet faster.

So, is it intended to either disrupt an incumbent, or to increase usage and thus drive revenue for core Google businesses? I can only think of two possibilities:

  1. TensorFlow is intended to be a major strategic benefit for Android. Machine learning is going to power a wave of new mobile applications, and many of them need to run locally rather than as a client-server app, whether for efficiency, responsiveness or bandwidth reasons. If TensorFlow makes it easier to develop cross-platform, efficient mobile machine learning solutions for Android but not for iOS, that could give the Android app market a major boost.
  2. TensorFlow is intended to be a major strategic benefit for Google’s platform/hosting, and to disrupt AWS. Right now, it’s pretty difficult and expensive to set up a cloud GPU instance. TensorFlow opens up the possibility of a granularly-scalable approach to machine learning that allows us to finally ignore the nitty-gritty of CUDA installations, Python dependencies, and multiple GPUs. Just specify the size of network you want, and TensorFlow allocates and spreads it across hardware as needed. This is why TensorBoard was part of the original implementation, and why AWS support was an afterthought. “Pay by the parameter”. If I had to guess, I’d say this is the major reason for open sourcing TensorFlow.

I want something like the above to be true, because I want there to be a strategic reason for Google to invest in TensorFlow, and I want it to get easier and easier to develop interesting and complex deep learning apps.

Sanity checks as data sidekicks

Abe Gong asked for good examples of ‘data sidekicks‘.

I still haven’t got the hang of distilling complex thoughts into 140 characters, and so I was worried my reply might have been compressed into cryptic nonsense.

Here’s what I was trying to say:

Let’s say you’re trying to do a difficult classification on a dataset that has had a lot of preprocessing/transformation, like fMRI brain data. There are a million reasons why things could be going wrong.

(sorry, Tolstoy)

Things could be failing for meaningful reasons, e.g.:

  • the brain doesn’t work the way you think, so you’re analysing the wrong brain regions or representing things in a different way
  • there’s signal there but it’s represented at a finer-grained resolution than you can measure.

But the most likely explanation is that you screwed up your preprocessing (mis-imported the data, mis-aligned the labels, mixed up the X-Y-Z dimensions etc).

If you can’t classify someone staring at a blank screen vs a screen with something on it, it’s probably something like this, since visual input is pretty much the strongest and most wide-spread signal in the brain – your whole posterior cortex lights up in response to high-salience images (like faces and places).

In the time I spent writing this, Abe had already figured out what I meant :)

Brain orchestras and fMRI analyses

[With help from David Weiss]

I spent much of my PhD working on algorithms for making sense of gigabytes of brain data from fMRI scanners, especially on a fairly new approach called Multi-variate Pattern Analysis (MVPA). I want to show you how the MVPA approach is useful for tackling certain kinds of questions.

Think of the brain as a kind of orchestra. You have lots of separate instruments playing at the same time, and you can subdivide them in lots of different ways, e.g.

  • You can subdivide the orchestra into parts by location – the 1st violins, the brass, the percussion etc.
  • Or you could organize them by what they’re doing. Say the 2nd violins, the oboes and the trumpets have the melody, while the clarinets and the tubas have the harmony. [The harps are doing their own thing and the bassoonist is drunk.]

Likewise, there are all kinds of things going on at once in the brain.

  • You can subdivide the brain by location – frontal, temporal, parietal, occipital lobes.
  • Or you could organize the sub-parts by what they’re doing – vision, language, executive control, motor etc.

Let’s go back to thinking about how the multivariate approach differs in the kinds of questions it can address.

Standard univariate analysis is useful if you want to tell which instruments are involved in one case rather than another, e.g.

  • violins are more active in Beethoven than Mozart, but for trumpets it’s the other way around


  • one part of the brain is more active when looking at houses than faces, but for another part it’s the other way around

In contrast, a multivariate analysis might be useful if you want to know:

  • is this Mozart or Beethoven?
  • is this the brain of someone looking at faces or houses?

Now, let’s introduce one more concept: dimensionality reduction is an attempt to boil down many instruments (or brain regions) into a few key themes/groups:

Take the famous da-da-da-dum of Beethoven’s Fifth, where the entire orchestra is one voice – one could more or less describe the entire orchestra’s activity in terms of just one theme/process. In contrast, for Bach or something more complex and interwoven, it might be very hard to summarize what’s going in with less than 10 themes.

Likewise, maybe it’s straightforward to summarize the brain’s activity with just one or two processes when you’re doing a very simple task like looking at faces vs houses, but if you’re doing something more complicated (like watching a movie) then multiple processes are interacting in complex ways.

David Weiss‘s PACA algorithm boils down the brain’s activity over time into just a few themes. Once you’ve summarized the 50,000 readouts we get from fMRI every few seconds into 50, it’s much more feasible to try and compare different cognitive processes – just as it’s much easier to compare Mozart and Beethoven by looking at the scores of a few key instruments than looking at the full orchestral scores.

PACA was inspired by a bunch of existing dimensionality reduction algorithms that could equally be applied to problems like voice, face or handwriting recognition.

But its magic involves adding a few constraints that are particularly relevant to the brain. Here’s one example of a constraint: it doesn’t allow its estimate of a theme’s presence at a given moment to go below zero. Think of it like this – when was the last time you heard an anti-violin? Or had an anti-thought? In other words, PACA breaks the manifold streams of activity in the brain down to just a few that are all present to a greater or lesser degree at each moment.

P.S. If you hated this, you might also hate How to beat an fMRI lie detector.

A hivemind with a sense of humor

I’m a little obsessed by the notion of a noosphere, a humming hivemind – not a humdrum, roaring average, but rather a superlinear interwoven sum of wits.

The Bible has this quality, with its sea of voices that are unabashedly inconsistent and yet superhumanly wise. But what I find most unsettling about the Bible is its lack of humor. To my knowledge, there’s not a jot of wit, humor or silliness in the whole thing. Perhaps this befits something with a purpose greater than simply sublime literature, but that dehumanizes it to me.

So, it’s endearing and cheering to see that Google can giggle [from autocompleteme.com]:

Though more soberingly, see what boyfriends and girlfriends search for.

How to beat an fMRI lie detector

In a not-so-distant dystopia, you might be placed in a brain scanner to test whether you’re telling the truth. Here’s how to cheat.

The polygraph

First, you’ll need some background on old-school lie-detection technology. [This is a simplified story – see polygraphs for a richer account.] Polygraphs are seismographs for the nervous system. They measure physiological responses such as heart rate, blood pressure, sweatiness through skin conductance, and breathing. When you’re anxious, angry, randy, in pain, or otherwise emotionally aroused, these measures spike automatically. The effort and stress of lying also causes them to spike.

Of course, if you’re trapped in a windowless room on trial for murder, these measures will probably be pretty high to begin with. So you’ll first be asked a few control questions to assess your baseline levels when lying and telling the truth, against which your physiological response to the important questions will be compared.

So, if you want to beat a polygraph, you need either keep your physiological responses stable when you lie (which is difficult), or you need to artificially elevate your baseline response when telling the truth. The age-old technique is to place a thumb-tack in your shoe, and press on it painfully with your toe when telling the truth, spiking your physiological responses, and providing a misleading control so that your lies don’t seem higher relatively.

Functional magnetic resonance imaging

Now, on to fMRI. Simplifying again, the fMRI brain scanner takes a reading of the level of metabolic activity at thousands of locations around your brain every couple of seconds. Activity in a number of brain areas tends to be elevated when we lie, perhaps because we have to work harder to invent and keep track of the extra information involved in a lie, and override the default responses in the rest of the brain. Under laboratory conditions, accuracy at distinguishing truth from lie approaches 100%.

The modern machine learning algorithms used to make sense of the richer neural data are more sophisticated than those used in a polygraph. And they’re measuring your brain activity (albeit indirectly), so it might feel as though there’s no way to deceive them. But ultimately, they work in an analogous way to the polygraph, by comparing your neural response to the important questions with your neural response to the baseline questions. That means that they can be gamed in an analogous way – as you’re being asked the baseline questions, wiggle your head, take a deep breath, do some simple arithmetic or tell a lie in your head. Each of these will elevate the neural response artificially. By disrupting the baseline response, you disrupt the comparison.

Possible flaws in this argument

This simplified account of how to cheat an fMRI lie detector has some issues.

Firstly, it rests on the idea that we’ll still use some kind of comparison between baseline and important questions. In the case of most recent fMRI analyses, this is certainly true. Although they use modern machine learning classification algorithms to compare against baseline, they still seem subject to the same problems as the simpler statistical tests used in polygraphs.

Above, I suggested taking a deep breath, doing simple arithmetic or telling a lie in your head during the baseline questions. Taking a deep breath increases the BOLD response measured by fMRI throughout your brain. The idea behind doing arithmetic or telling a lie in your head is to engage the brain areas involved in internal mental conflict detection (between areas of the brain that are pulling in different directions), executive control (over the rest of your brain), and working memory whose activity changes when lying. As far as I know, all of the studies on lie detection seem to use naive participants, and no one has yet tested the efficacy of these counter-measures.

I have also assumed that the analysis would be run ‘within subject’. In other words, the machine learning classifier algorithms would be making a comparison between baseline and important questions for the *same person*. However, there have been attempts to train the algorithms on a corpus of data from multiple participants beforehand, and then applied to a new brain. This approach is considerably and inherently less accurate (less than 90% as opposed to nearly 100%) since everyone’s brain is different, and since brain activity will probably vary for different kinds of lies. Indeed, there appears to be variability in the areas that have been identified by different experiments.

There are alternative experimental paradigms to the basic questioning approach described here. For instance, one might show someone the scene of a crime, and look to see whether their brain registers familiarity. I haven’t looked into this approach. But fundamentally, this familiarity assessment is much more limited in the kinds of questions that can be asked, and furthermore, you only get one chance to assess someone’s familiarity (after which the stimulus is, by definition, familiar). That single response simply might not be enough data to go on.

All of the studies so far have employed ‘willing’ participants. In other words, the participants kept their heads still, told the truth when they were asked to, and lied when they were asked to. An uncooperative participant might move around more (blurring the image), show generally elevated levels of arousal that could skew their data, be in worse mental or physical condition, and come from a different population than the predominantly white, young, relaxed, intelligent and willing undergraduate participants. We don’t know how these factors change things, and it’s difficult to see how we might collect reliable experimental data to better understand them.

I haven’t considered alternative imaging methodologies here (such as EEG or infrared imaging). Mostly though, fMRI appears to be leading the field in terms of accuracy and effort spent, and all of these arguments should apply to EEG and other methods equally.

Why am I writing this?

There are a number of fMRI-based lie detection startups attracting government funding and attempting to charge for their services. I don’t begrudge them their entrepreneurial ambition, but I am dismayed by their hyperbolic avowals of success.

In truth, this is a new, mostly unproven technology that seems to work fairly well in laboratory conditions. But it’s subject to the same sensitivity/specificity tradeoffs that plagues medical tests and traditional lie detection technologies. The allure of an ostensibly direct window into the mind with the shiny veneer of scientific infallibility is a beguiling combination.

Eventually, the limitations of this technology will be realized. I’d prefer to see this techno-myth punctured and caution exercised now, rather than after costly mistakes have been made. Cheeringly, the courts appear to take the same view (at least so far).

My credentials

I’m finishing my PhD in the psychology and neuroscience of human forgetting at Princeton. I’ve worked on the application of machine learning methods to fMRI for the last few years, was part of the prize-winning team in the Pittsburgh fMRI mind-reading competition, and lead the development of a popular software toolbox for applying these algorithms for scientific analysis. However, I have no expertise in the neuroscience of cognitive control, lie detection or law.

So I apologize if I’m wrong or out of date anywhere here. If so, I’d be glad to see this pointed out and to amend things.