Why has Google open sourced TensorFlow?

I was sitting in a sun-warmed pizza restaurant in London last week talking about deep learning libraries. Everyone had their favourites. I was betting on TensorFlow, the new kid in town released by Google in late 2015. In response, a Torch fan pointed out that Google may invest in building up TensorFlow internally, but there’s no reason for them to invest in the shared, external version.

This got me thinking – why has Google open sourced TensorFlow?

Naively, I usually assume that companies keep their most crown jewels proprietary while open sourcing the periphery. In other words, keep your secret sauce close to your chest – but share the stuff that’s more generic, since it builds brand and goodwill, others may contribute helpfully, and you’re not straightforwardly giving a leg-up to your direct competitors.

Google’s approach to open source has been a little more strategic than this. Look at a handful of their major open source projects – Android, Chromium, Angular, Go, Dart, V8, Wave, WebM. The motivations behind them are various:

  • Android, Angular, Chromium, V8, Wave, WebM – creating a new version of an existing technology (free, better engineered, or faster) to disrupt an incumbent, or increase usage and thus drive revenue for Google’s core businesses.
  • Go, Dart and the long tail of minor projects are peripheral to their goals and serve less direct strategic interest.

For TensorFlow to make sense and be worthy of long-term support from Google, it needs to fall in the former category.

It is indeed a new version of an existing technology – it’s free, it’s better engineered, though not yet faster.

So, is it intended to either disrupt an incumbent, or to increase usage and thus drive revenue for core Google businesses? I can only think of two possibilities:

  1. TensorFlow is intended to be a major strategic benefit for Android. Machine learning is going to power a wave of new mobile applications, and many of them need to run locally rather than as a client-server app, whether for efficiency, responsiveness or bandwidth reasons. If TensorFlow makes it easier to develop cross-platform, efficient mobile machine learning solutions for Android but not for iOS, that could give the Android app market a major boost.
  2. TensorFlow is intended to be a major strategic benefit for Google’s platform/hosting, and to disrupt AWS. Right now, it’s pretty difficult and expensive to set up a cloud GPU instance. TensorFlow opens up the possibility of a granularly-scalable approach to machine learning that allows us to finally ignore the nitty-gritty of CUDA installations, Python dependencies, and multiple GPUs. Just specify the size of network you want, and TensorFlow allocates and spreads it across hardware as needed. This is why TensorBoard was part of the original implementation, and why AWS support was an afterthought. “Pay by the parameter”. If I had to guess, I’d say this is the major reason for open sourcing TensorFlow.

I want something like the above to be true, because I want there to be a strategic reason for Google to invest in TensorFlow, and I want it to get easier and easier to develop interesting and complex deep learning apps.

Todo Zero

What if I suggested that you finish each day with nothing left on your todo list? This is the only rule of Todo Zero.

You might find yourself biting back some choice words. This sounds like unhelpful advice from someone with a much simpler life than yours.

Not so fast. Picture a world-class juggler with half-a-dozen balls in motion. How many balls do they have in their hands at once? None, one, or two. Never more than two. The remainder are in the air.

By analogy, work on just one or two things at a time. The remainder can be scheduled for some time in the future. In this way, it’s very possible to finish what’s currently on your list.

Otherwise, all of the competing priorities of a long list clamour for your attention. They clutter one another, making it impossible to focus. When you’re pulled in many directions, you’ll end up immobilized and demotivated.

At least that’s what has happened to me. My implicit solution was to procrastinate until panic seized me, and then enjoy its temporary clarity of focus.

So, here’s a recipe for Todo Zero that will take an hour or two to start with:

  • Go through your todo list and pull out anything that’s going to take less than 10 minutes.
  • Pick out the one or two jobs that you really want to tackle – these should be the most important or urgent things on your list. Break them down into pieces that you could tackle today if you really put your mind to it, and note them down.
  • Schedule everything else as future events in your calendar (I usually just assign them to a date without a time). Give yourself enough room before the deadline to finish them without rushing. Don’t be over-optimistic about how many or how quickly you can work through them.

So, that leaves you with quick tasks that take less than 10 minutes, along with the one or two most urgent/important jobs for today.

Marvel at your wonderfully shortened todo list. Look away, take a deep breath. Do not look at your email. Make a coffee. Feel a little calmer than you did, and enjoy it.

Now, let’s do the same for your email.

  • Find any emails that are going to take less than 10 minutes to reply to, and boomerang them for 2 hours’ time.
  • Pull out one or two emails that are urgent or important, and boomerang them for 1 hour’s time.
  • If you have the energy, boomerang each of your remaining emails for future times individually (tomorrow, a week away or a month away, depending on urgency). If you don’t have the energy, just boomerang them wholesale for tomorrow morning.

Stand up, and take a deep breath. Walk around for a few minutes, and make a cup of coffee. This is going really well.

  • By the time you get back, you should be staring at a short todo list and a pretty clear inbox. [If anything new has landed, or any have boomeranged back, send them away for an hour. We need a clear head]
  • Now, let’s dispatch the less-than-ten-minute odds & ends tasks. Do some of them, most of them, all of them, it doesn’t matter. Just a few, to get back a sense of momentum.
  • Your most urgent emails have boomeranged back. Deal with them.

Take a break.

At this point, you’re close to the point where you have a clean slate, and just your important tasks. You probably have some meetings and stuff. Have lunch. Refresh.

  • Now, it’s time to tackle those one or two important high-priority tasks-for-today.
  • Picture yourself at the end of the day, leaning back in your chair with your hands knitted behind your head, smugly. For that to happen, double down on those one or two most important things, and the rest can wait. You will feel great.
  • Don’t do anything else today. Don’t check your email if you can avoid it. Your goal is to boomerang away (by email or calendar) anything but them.

With any luck, you made progress on those one or two most important tasks.

Armed with this approach, you can triage your own life. You can choose to focus on the most urgent or important things first, and ignore the rest. They’ll shamble back when their time has come, and then you can dispatch them in turn.

P.S. There are a few tools that will help:

  • Google Calendar – add a new ‘Todo’ calendar, whose notifications are set by default to email you at the time of the event.
  • Any simple todo list app or text editor of your choosing. It doesn’t matter.

P.P.S. One final note. I can’t juggle two balls, let alone six. So take that into account, seasoned with a pinch of salt, in reading this.

P.P.P.S. Of course, there is nothing that’s original here. It’s a death-metal-mashup of Inbox Zero and GTD. It’s not always feasible to work like this. If you don’t procrastinate, you probably don’t need it. Etc.

Two-level tagging

Have you ever had trouble deciding where to store a file on your hard disk? Or worse, had trouble finding it later?

When you store a file on your hard disk, you have to decide which folder to put it in. That folder can in turn live inside other folders. This results in a hierarchy, known in computer science as a *tree*.

The main problem with trees is that sometimes you want things to live in multiple places.

Tagging provides an alternate system. Tags are a lot like folders, except that things can belong to multiple tags. However, but the tags can’t themselves belong to anything. So you have just one level of organisation with no nesting.

The main problem with single-level tagging is that it’s too simple. We want to be able to use fine-grained categories (e.g. ‘lesser spotted greeb’) that themselves belong to higher-level categories (e.g. ‘greeb’, or even ‘bird’ or ‘animal’). But we said that tags can’t themselves belong to tags.

Described like this, perhaps the solution will seem obvious to you too. We want things to belong to multiple tags, and for those tags to sometimes belong to other tags.

I built this into Emacs Freex, my note-taking system.

For instance, I have tagged this blog post with ‘data structure’ and ‘blogme’. In turn ‘data structure’ is tagged with ‘computer science’ and ‘blogme’ is tagged with ‘writing’. So I can find this blog post later in various ways, including by intersecting ‘computer science’ and ‘writing’.

This gives you the best of both worlds: things belong to multiple categories, along with a hierarchy of categories.

It has to be easy, and worth it, for you to add tags

Whoever adopted the idea that “there’s a place for everything, and everything in its place” when it came to organizing files and ideas on a computer suffered from a failure of imagination. Or maybe they were just over-wedded to the desktop and filing cabinet metaphors. Fortunately, the idea of ‘tagging’ (or ‘labels’ in Google’s parlance) blew that whole banal tidiness away. In short, tagging lets you assign things to multiple categories, or if you prefer, put things in multiple places. Rashmi describes this well – tagging is popular because there’s a lower cognitive cost when you can put things in multiple categories, rather than having to decide on just one.

We’ve only just started to scratch the surface of how categorization schemes could work. I’m going to propose a few ways in which things might grow from here, focusing on the restricted case where you’re tagging your own files privately, ignoring all the interesting goodness that happens when those tags are available to others, delicious-style.

N.B. I’m going to use the term ‘category’ rather than ‘tag’, since it’s easier to think of things belonging to categories than being labelled with a tag. The key notion is that things can belong to multiple categories simultaneously.

The more tags the better

Jon Udell has a great post on building up a taxonomy of categories by hand, starting with a smallish corpus of documents, and just letting the taxonomy emerge, combined with a little judicious weeding. The dataset he has in mind is pretty small, and so he’s aiming for 15-40 categories. The kinds of datasets I have in mind are much larger.

For instance, I have a few thousand text files with notes on topics ranging from Ubuntu troubleshooting to the symptoms of schizophrenia to my travel arrangements for the summer. I could maybe try and shoe-horn things into a few tens of categories, with each category holding many items, and each item belonging to maybe one or two categories. But I very quickly found this to be unsatisfying. We want to be able to differentiate things more finely than that. For instance, how would I categorize a document containing hotel bookings in Florence last summer for the HBM conference? Just by ‘travel’? Or also ‘Florence’, ‘conference’, ‘hotel’, ‘HBM’, and ‘2007’. Remember the argument about lower cognitive cost though – it’s much less effort just to include all those categories. If I do that, I’ll end up with many hundreds or even thousands of categories, some of which will have tens or hundreds of members and some of which might only have one or two members. I think one might raise two main objections to this approach:

  • Can you really be bothered to add a bunch of categories each time you write something?
  • How do you begin to find anything now? Sometimes filtering by a category doesn’t help because it returns way too many members, and sometimes it doesn’t help because it returns hardly any. Where’s Goldilocks when you need her?

I’ll address these in turn.

Can you be bothered to add a bunch of categories each time?

People are lazy. Any system that requires people to be assiduous book-keepers while they’re writing is doomed. Dave Winer talks about how he should be categorizing all his posts, and yet he doesn’t do it – and this makes him feel guilty. He knows that he won’t be able to trust the categories to find that thing later. The value of the whole system has dropped. Squirrels wouldn’t go to the effort of hoarding nuts for the winter if they knew that they wouldn’t remember where those nuts are when they need them. So what’s the point of hoarding nuts any more? All of a sudden, the system has broken down. We need to find a way to make the system less brittle.

Let’s look at Dave Winer’s guilty confession a little more closely:

“I have a very easy category routing system built-in to my blogging software. To route an item to a category, I just right-click and choose a category from a hierarchy of menus. I can’t imagine that it could be easier. Yet I don’t do it.”

If you ask me, that’s not easy enough. Navigating hierarchical menus with a mouse is slow and distracting. Blogger does it right – there’s a ‘labels’ text box that you can tab to, into which you can write a comma-delimited list of tags. As you type, it auto-suggests – pressing ‘return’ fills in the rest of the tag and puts a comma and space after for you. So that’s step 1.

But it should be even easier. What should happen is that the machine should automatically throw up a list of tags that it thinks might be appropriate for this post. It should put the ones it’s most confident about to the left, and less confident ones to the right, with the cursor positioned at the end to make it easy for the user to delete false positives and add new categories it missed. And if you’re feeling lazy, then you can just accept the machine’s suggestions without glancing at them. The cost of a false positive is low, so it’ll deliberately suggest too many. This brings us neatly to our second concern.

But then how do you find anything?

So now every document belongs to a bajillion categories, none of which is particularly useful on its own. But a conjunction of categories should narrow things down nicely. If I’m trying to find that hotel booking in Florence, I don’t have to worry about remembering whether it’s tagged with ‘travel’, ‘hotel’, ‘Florence’, ‘2007’ or ‘HBM conference’, since it’s tagged with all of them. So I’ll try filtering by the conjunction of ‘hotel’+’Florence’+’2007’ and that’ll probably winnow things down sufficiently for me to pick the file out manually (see also: make tags not trees). .

But maybe we never made a ‘Florence’ category. It seems like such a natural cue to use now, but at the time, ‘Florence’ didn’t spring to mind as a salient category, despite our liberal categorizing policy. If the system auto-completes in a handy way, we’d already know this, and our fingers would already be backspacing and trying ‘Italy’ or ‘HBM conference’. There are many points of failure, but there are also many points of entry. If we make it easy enough to cue for conjunctions of categories, then there’s a very low cognitive cost to having to backtrack once or twice, since our brain effortlessly supplies us with so many possible cues to use.

We could make things even less brittle in lots and lots of ways. Perhaps the system notices that only one item in the whole database is tagged with ‘Florence’, so it’s probably too restrictive a category. No matter. It could just ignore ‘Florence’, or suggest that we omit ‘Florence’ from our search. Better still, and less intrusively, it could now grep through all the files that match one or more of the tags to see if ‘Florence’ appears in the text, and automatically suggest any matches as partial matches.


I keep coming back to the same feeling – for the most part, people don’t write notes because they don’t think they’ll be able to find those notes later when they need them – so why bother writing the notes in the first place?

All of these suggestions are geared towards:

  • Reducing the cognitive cost at both writing and retrieval. If it’s less effort, you’ll feel less lazy about adding category metadata.
  • Making the system less brittle, so that if you were lazy about your category metadata, you still have a good chance of finding things later. This is the key to ensuring that you don’t end up losing faith and give up on writing things down in a structured way altogether.

Taken together, I hope that it will become easier to categorize your notes in a way that helps you find them later, which is going to make you much more likely to write them down in the first place.

Collaborative filtering and how it’s going to help us consume

In the future, we will routinely employ some product that will probably be called Microsoft MyLife (1) to manage our reading, news, entertainment and shopping. What will it do? Let’s start with the present and build forwards. For my money, amazon.com is the best site in the business. It takes the only shopping activity I enjoy, book-shopping, and manages to make it even better online.

Shopping with Amazon is so pleasurable and fruitful because it first leads me by the hand towards things that I’m genuinely interested in and then provides me with the 3rd-party reviews and ratings feedback that I always find myself hungering for when buying something. It’s like having Virgil for a librarian. It’s shopping by democracy, where your candidate always wins. But it’s still pretty limited. I want to be able to head to the recommendations page and choose to be recommended books of a certain kind only, rather than having my interests in neuroscience, programming and sci-fi lumped indiscriminately together. I may want it to weight recent purchases heavily, or to only find books by authors I’ve never read. But the possibilities for tinkering with the recommendations parameters are sadly limited.

I dream of a ‘How lucky do you feel, punk?’ slider, that ranges from conservative to adventurous. Perhaps today I’m tired and I want something I’m certain I’ll like. If I’ve bought the first 35 of David Gemmell’s Waylander books, Amazon can be pretty sure I’ll like the 36th, since there’s no way to tell them apart. But maybe tomorrow I’ll be high on redbull and tractor fluid, and I’ll want something new and unexpected. Perhaps initial impressions indicate that I’ll like some new author who’s making waves, or perhaps Amazon’s crazy collaborative filtering algorithm thinks that David Foster Wallace + David Sedaris = Tom Robbins, and recommends something to me out of left field accordingly. After all, I want help choosing a book, but part of the reason I like browsing for fiction arranged alphabetically is that you never know what’ s going to catch your eye. I can choose on a given day whether to browse only for names I know, or to open myself up for something fresh and unexpected.

Secondly, I want to be able to ask for recommendations for someone else. Let’s say it’s my dad’s birthday. I want to ask Amazon, ‘What would I like if I was a middle-aged man who likes John Le Carre, Tom Peters and Hoagie Carmichael?’. I want to be able to create a persona for my dad, and for it to make some guesses. Even if they’re terrible, maybe they’ll give me ideas, or maybe I just need to give the system a little more information. At this point, things could get interesting. It would be pretty easy to integrate this with my dad’s actual Amazon account, if he chooses to let me, so that it could take his purchasing history and wishlist into account as extra information. It would know what books he’s bought recently, and so might remind me of some interests of his that i’ve forgotten, or of some burgeoning interests that I can sneakily anticipate. And i’m prepared to bet that it could do this with just a broad sprinkling of sample purchases to guide things. You can think of the adventurousness slider bar mentioned above as titrating from Marks & Spencer pullovers to gift vouchers at Stringfellows. The point is that i want to be able to tap my guide on the shoulder, shake my head, and point in a different direction. ‘Yes’ to the Herend china, but ‘no’ to the Chinese Hentai. The current collaborative filtering algorithm that they use to make recommendations works brilliantly, but is amazingly restrictive in the way that you can tweak it.

Let’s say that Jeff Bezos reads this, slaps his forehead at the obvious genius of it all, and immediately engages a few of his platoons of elite Bonobo chimps trained from birth in arcane RDBMS lore to implement all of this. What next?

It knows what books I like. Why stop at books? Amazon sells everything except the kitchen sink. It probably does, in fact, sell legions of kitchen sinks too. But let’s stick to books, music and films for now. It seems obvious to me that what I read, what I listen to, and what I watch are going to be predictive of each other. In fact, the broader the information the system knows about you, the better I would imagine it could triangulate on what you like and generalize to useful recommendations. It should be relatively effortless for Amazon to generalize from books to music to films, or vice versa, and I’d be astonished if they weren’t already doing that. It’s not so clear how your furniture purchases might be dictated by your reading habits, but it’s not ridiculous either to think that a young 20-something male with money to burn who likes Friends might very easily be persuaded to buy a Lay-Z-Boy comfy chair (as featured on the series) if a few DVDs from the series (that he doesn’t have) get bundled free.

Walmart are starting to use this kind of data-mining in all kinds of ingenious, insidious ways with their product placement, but I’m talking narrowcast, baby. I’m talking about a one-time offer for you and you alone, brought to you direct by the system. I don’t really care all that much about Amazon knowing all this about me as long as they promise promise promise not to sell it, and as long as they continue to help me buy great shit cheaply without actually having to shop for it.

So they can tell me what DVDs and furniture to buy based on what I read. What if Amazon bought Ticketmaster.com tomorrow? Then, they could send me an email telling me that there’s a crazy new concert/play/demonstration/sewing circle next week, and would I like tickets? It knows that I can’t tell richly-developed fictional characters from a rotting horse’s arse, and that I like Dan Brown and art, so it cobbles together a deal with lastminute.com to send me to Paris, Rome, London and Roslyn at highly discounted rates.

How does it know all this? Because other people who like the things I like – they liked that. Sure, I’m being shepherded, but if I could have my own personal shepherd who keeps pointing out great unsigned bands in intimate venues, movies from Chile that don’t have Stephen Segal in them, and books that make me cry, then sign me up to be a sheep.

It’s pretty easy to see where this is going. TV’s going the way of the dodo, and even Tivo’s a bit tovo. I don’t want anyone to ever tell me that I have to watch the West Wing one episode at a time, once a god-damned-week. I want to buy 50 TV meals and watch them back-to-back without sleep. And there probably aren’t that many people quite like me, but there are a few, and that’s exactly what they like to do, so it shouldn’t be too hard for my collaborative augmentation shepherd to have my TV meals frisbeed through my open window at regular intervals by a supermarket delivery man.

How far might the system be able to generalize across domains for a given person? If it knows about my book, music and film tastes, could it start to guess what kinds of plays I would like, or magazines I would read? Pretty soon, it could start recommending clothes and events and articles.

If you start to map individuals to their locations and movements, then you could start to make recommendations about where to shop or visit. What could be more useful than knowing where my dad goes to shop, if I’m trying to buy him a birthday present? Actually, I can think of one thing more useful than that – knowing where people looking for presents for their dad went when they went shopping… It could plan out routes, take me to little one-of-a-kind shops tucked away, either because I tell it about them, by keeping track of my credit record, following my movements with something like GPS.

Eventually, you could see how this could improve, or invade, every aspect of our life. All of the information you consume would be customized to your tastes, or if you prefer sometimes, to someone else’s tastes. It seems critical though that you’d always be able to tweak the knobs when you’re feeling adventurous, because it would be so easy for us to habitually tread the same well-worn paths, hearing only the opinions that we’ve trained the system we want to hear.


(1) The name’s so catchy, trite, and alarmingly intrusive-sounding that I couldn’t pass it up. There will probably be an open source version called GNU Memacs.

[Update: I think they already have a MyLifeBits project that focuses on collecting all the data amassed over your life together, but that’s not really what’s being discussed here. That’s about retrieving information. This is about proactively suggesting new stuff from the cloud.]

Make tags not trees – filesystem idea based on tags instead of hierarchical directories

Until recently, it was easier to find something amidst the five zillion pages on the web than it was to find something on your own hard disk. It would be faster to Google for something than to burrow through subdirectories looking for it.

Could this be because the files on my hard disk are poorly organized? Bah. Maybe so. But that’s not my fault – it’s more or less inevitable once you have a lot of files, because hierarchical filesystems require each file to live in a single location. If I download a paper on memory for a class, should I organize by:

  • the context, e.g. the name of the class, lumping together all my writings and reading materials from that context together – ~/psy330/reading/
  • or by the type – things I’ve written vs reading materials – ~/reading/psy330/
  • or by good/bad or date produced or something else entirely?

Whichever decision you make, there’ll be times when you’ll wish things were organized some other way. This is why tagging is so popular. It’s because things inherently belong to multiple categories. And, because tagging is easy.
Google Desktop, Spotlight, Beagle and other offerings have helped considerably with all this. If you want to locate a single file, and you can’t remember where you put it, then full-text search is the way to go. But let’s consider the case where you have files that you want to treat as related, even if their contents aren’t obviously similar. We want this all the time. Take the reading list for a particular course or project as an example. This is why we needed directories and filing cabinets in the first place.
My proposal here is to replace the hierarchical filesystem with a completely flat space and lots of tags. Each file would be tagged with one or more tags, just like on http://del.icio.us/. The ‘save as’ dialog would look a little different. Instead of a list of directories that you can burrow into, there’d be a list of tags. When saving a file, you’d select as few or as many as you like, give the file a name just as now, and you’re done. To open a document, you filter using some tags, watching the list of files that match being winnowed down, and select from an alphabetized list. Or, use wildcards to winnow down by filename directly. Or some combination.
Converting an existing hierarchical filesystem would be easy in most cases. You could just grab all the subdirectory names in a path and treat them like unordered words in a bag. Let’s keep the same ‘/’ file separator we’re used to, but change its implicit meaning from ‘contains-this-directory’ to ‘and-also-this-tag’, so:

  • ~/reading/psy330/hippocampus/blah.pdf

would now be equally accessible from:

  • ~/reading/psy330/hippocampus/blah.pdf
  • ~/psy330/reading/hippocampus/blah.pdf
  • ~/reading/hippocampus/psy330/blah.pdf

All these locations would end up meaning the same thing. In this way, a subdirectory is really a conjunction of tags. In our simple example of storing .doc and .pdf files for documents and reading materials for a class, we’d simply tag some of them ‘doc’ and some of them ‘reading’, and give them both the ‘psy330’ tag for the class.
Upon looking at this, it’s clear you’ve lost some information, but I don’t think it’s information we’d miss much. The assumption underlying a lot of this is that where we now have hierarchy, we could manage just as well with intersecting sets, which would require considerably less effort to memorize.
There are, inevitably, unanswered questions and lurking gotchas.

  • I think we’d probably want to create a default/preferred way of expressing things, so that tags with more items or that are more discriminative go on the left, or something akin.
  • You shouldn’t need to specify all the tags for a given file. Just enough to specify it uniquely, given its filename. So, if there are no other blah.pdf files in the ‘reading’ tag, then you should probably be able to access it straightforwardly at ~/reading/blah.pdf though this has the unfortunate implication that if you were to add a new blah.pdf that also had a ‘reading’ tag, the above location would become ambiguous.
    If there are multiple blah.pdf files in the reading tag, then the system would need to prompt you with a list of tags that would help disambiguate them. Wikipedia’s interface might have some lessons about disambigation that could be learned from.
  • At this stage, a tags-not-trees system seems better-suited for home directories (‘My Documents’ for Windows users) than system directories. In home directories, most of the organization is human-generated and needs to be human-readable, whereas /etc directories are mostly machine-generated to be uncomplicatedly machine-readable.
  • The only way metadata-entry systems work is if they require little work on the user’s part. The nice thing about tagging is that it should be relatively easy for the computer to make guesses about which tags you’ll want to put something in, based on your tagging of previous files. So when you click ‘save as’, it will prompt you with a list of tags that it thinks you’ll want to use, ordered in terms of certainty. You delete a couple, add a couple more, and leave the rest in place.
    This is not a trivial problem, but you’ll have a large corpus from which to do your Bayesian learning (or whatever). And you can seed the corpus from day one with information from the existing file hierarchy, and with some clustering applied to the full text of the files.
    This is the kind of problem that machine learning can really help with. There’s a decent amount of data, it’s getting feedback on each guess from the user and it’s doesn’t matter if it’s occasionally off-base because it’s only making suggestions.

I like this idea. I even think it might work, though I admit to feeling a little unsettled by the notion that all the files on my hard disk would effectively live in one place. Well, that’s not strictly true. Our notion of ‘space’ in filesystems would have to warp a little. It’s easy enough to imagine a filesystem now as a ramifying rabbit warren. This would require us to think of file locations in terms of boolean queries, and I can’t come up with a nice metaphor. I think it’s easy enough to grasp, but there’s nothing outside the computer that implements tags, because they inherently incorporate the idea of superposition (one thing existing in multiple places).
I would love to see a FUSE implementation of this. It would have to be open source and run on Linux, and I’d consider trying it. The closest I’ve seen (from this list) are:

  • OpenomyFS – propietary and web-based. Otherwise, looks interesting
  • TagsFs – seems to be focused on mp3 tags
  • RelFS – a full relational database
  • LFS – the most interesting of the bunch

If it turns out that any of those projects are alive and easy to try, I’d be pretty gung-ho about it.

UPDATE: there are some great links and comments below, and also at:

The Turing tournament – a proposal for a reformulation of the Turing Test

  1. Introduction
  2. Describing the Turing Tournament
  3. Comparing the Turing Test and the Turing Tournament
  4. Devising new rules, and non-linguistic competitors
  5. But is it intelligent?

MH: Are you a computer?

Dell: Nope.

MH: You’d be surprised how many fall for that one.

Dell: Not me.


MH: What’s fifty-six times thirty-three?

Dell: One thousand eight hundred forty-eight.

MH: You’re pretty fast!

Dell: Those are my favorite numbers.

— from http://home.sprynet.com/~owl1/turing.htm


The Turing Test was designed to be an operational test of whether a machine can think. In Stuart Shieber‘s words:

“How do you test if something is a meter long? You compare it with an object postulated to be a meter long. If the two are indistinguishable with regard to the pertinent property, their length, then you can conclude that the tested object is the given length. Now, how do you tell if something is intelligent? You compare it with an entity postulated to be intelligent. If the two are indistinguishable with regard to the pertinent properties, then you can conclude that the tested entity is intelligent (pg 1).”

In order for a machine to be deemed intelligent according to the Turing Test, we would determine whether human judges could reliably distinguish a human from the machine after some lengthy text-only conversation. I don’t think a machine is going to pass it any time soon, and when it does, it’ll be pretty self-evident that we’re dealing with a machine that can think.

Anyone who disagrees that a full and proper Turing Test is a stringent enough test of intelligence should read Robert French‘s excellent discussion of the kinds of very human and culturally rooted subcognitive processes that would have to going on in the machine in order for it to pass. His criticism is that the Turing Test “provides a guarantee not of intelligence but of culturally-oriented human intelligence”, i.e. that it sets the bar too high, or too narrowly. This is a subtler variant of the obvious point that human beings who don’t speak English would fail a Turing Test with English-speaking judges. In other words, the Turing Test is a necessary but not sufficient test of intelligence, because you would have to have a certain subcognitive makeup in order to pass it, on top of being intelligent.

The beautiful thing about the Turing Test is that there’s nothing about it that’s specific to machines. Indeed, Turing’s original idea for the Imitation Game, as he termed it, was based on a parlour game where the judge attempted to distinguish male from female players. This essay is an attempt to broaden the scope of the Turing Test from being a binary and culturally-rooted test of human intelligence to something vaguer and less unidimensional.

Let’s make this idea somewhat more concrete, and considerably more vivid. Imagine that a small, slimy green-headed alien lands on your lawn right now, travelling in a spaceship the size of a Buick. Assume that the alien demonstrates its extraterrestrial credentials to your satisfaction by whisking you to its home planet and back before breakfast. It bats away the impact of a few .357 rounds with its forcefield and patiently replicates household objects for your amusement. It would seem niggardly to refuse a being that has mastered faster-than-light travel the ascription of intelligence when most humans can’t tie their shoelaces in the morning without a dose of caffeine. So we might be moved to patch the Turing Test in some ad hoc manner to read:

“Any entity that cannot be reliably distinguished from a human after a lengthy text-only conversation, OR that has mastered faster-than-light travel and can withstand a .357 round at close up, can be considered to be intelligent.”

It’s clear that this lacks the pithy generality of Turing’s original formulation, and we’d have to do quite a lot more work to restrict the scope of the above to exclude asteroids. Perhaps over time, our super-intelligent alien will learn to speak English with a flawless cockney accent, and will pass the standard Turing Test, rendering this discussion moot. But in the meantime, before it has learned to speak a human language, we are faced with a manifestly intelligent being that fails our gold standard test for intelligence. The background aim of this whole essay will be to consider a new version of the Turing Test that overcomes the inherent human- and language-specific parochialism of the original. That way, our intelligent alien might pass, without having to learn to speak English with a cockney accent.

Along the way, it may be that our reformulated test provides a more constructive goal and yardstick by which to direct and evaluate progress in AI research than the standard Turing Test. Perhaps its primary limitation is that it’s difficult to restrict the difficulty or scope without losing everything that’s interesting about the test. And since even our current best efforts are a long way from success, the gradient of improvement is almost flat in every direction, making it difficult to discern when progress is being made in the right direction. This makes it difficult for machines to bootstrap themselves by training against each other, requiring lots of labour-intensive profiling against humans. Finally, the current test is very language-orientated, and undesirably emphasizes domain knowledge,

Describing the Turing Tournament

I’ll term this new version of the Turing Test the ‘Turing Tournament’, to reflect its competitive round robin form. Like the original Turing Test, the Turing Tournament will not yield a definitive, objective yes/no answer, but rather a ranking of the entrants, where the human players provide a benchmark. A lot of the details I’m proposing will probably need considerable refinement. Here are the organizing principles of a Turing Tournament:

  • The organizers of each tournament decide what the domain of play will be, e.g. a chessboard, text, a paint program, a 3D virtual reality environment, binary numbers, or some multidimensional analogue stimuli.
  • Every ‘player’ (within which I’m subsuming both human and machine variants) is competing in a round robin competition, and will play every other player twice, once as the ‘teacher’ and once as the ‘student’.
  • Every bout will have two players, a teacher and a student. Play proceeds in turns, with the teacher going first. Play terminates when the allotted time has been exceeded, or when some terminating criterion specified by the teacher has been satisfied. Neither player will know the identity of the other player.
  • Before the bouts begin, every player is given access to the domain of play so that they can construct their own set of rules that will operate when they are the teachers in a bout.
  • The organizers of each tournament determine the scoring for bouts that terminate relative to bouts whose time elapses. We will consider some possible scoring systems later.

These sound like strange rules. What kind of games could be played? Why does each teacher get to set their own rules? Do teachers get rewarded or punished if a student is able to reach criterion for their bout?

I think the easiest way to illustrate what I have in mind is with a concrete example. Imagine the following scenario:

  • A big room with lots of people sitting at computers. The people are the human players. The machine players are sitting inside a big server at the back of the room.
  • The domain for this competition is a Go board, a 19×19 checkerboard with black and white pieces. Although all bouts in this tournament will take place on a Go board, the rules and goals of each bout will be up the teacher of that bout.
  • Let us peer over the shoulder of a human player, currently in the role of student, trying to determine what the rules of the bout are, and play so that the bout terminates before running out of time. Neither we nor they know whether the other player is human or machine.
  • The board is blank initially.
  • As always, the teacher makes the first move. They place a horizontal line of 19 black pieces in the bottom row of the board.
  • Now it is the student’s turn. They have no idea how the bout is scored, what the aim is, what constitutes a legal move, how many moves there will be or whether there will be multiple sub-bouts. All of that is up to the teacher.

    Working on the assumption that the teacher wants the student to play white, the student lays down a single white piece in the top left corner.

  • The teacher removes the white piece, and replaces it with a horizontal row of white pieces just above the existing horizontal black line, and another horizontal row of black pieces above that. So now there are three rows of pieces filling up from the bottom of the board: black, white and then black.
  • The student decides that the removal of their white piece in the corner was a signal that its future moves should consist of placing an entire row of pieces on the board at a time. The student tries placing an entire row of white pieces in the top row of the board.
  • The teacher again removes all the student’s pieces, and replaces them with another row of white pieces and another row of black pieces. The bottom of the board consists of black, white, black, white and black stripes.
  • The student reasons that its next move should be to place a row of white pieces above the most recent row of black pieces to continue the stripy pattern.
  • Gratifyingly, the teacher leaves the row of white pieces in place, and adds a black row above it, as expected.
  • The two players continue to take turns until all but the top row has been filled with alternating black and white rows.

    Now, it is once more the teacher’s turn, and the student wonders whether the last row will be filled in. Instead, the board blanks again, and the teacher places a vertical column of white pieces on the right hand side.

  • The student tries tentatively to place an adjacent column of black pieces, deciding that this sub-bout involves creating black and white vertical stripes, with the black and white players reversed.
  • As it turns out, this assumption appears to be correct, since the teacher does not remove the student’s pieces, and together they quickly build up an alternating vertical stripe that moves leftwards.
  • When only the last column remains to be filled in by the teacher, the bout has reached criterion, and the student moves on to the next bout, with a different player.
  • Upon inspecting the scores later, our human player (the ‘student’ in the bout just described) finds that they had scored highly on that bout, but not as high as some. Some of the machine players had failed to see a pattern at all, and had been putting pieces down more or less at random. These players did the worst, since the scoring for this tournament is a function of the total number of turns taken to finish the bout, as well as the number of errors made by the student. (need to clarify???)

    Like our hero, the best players at this bout had also quickly deduced that the pattern involved stripes. Their extra insight came after a few turns, where they tried placing multiple stripes down at once. As it turns out, there was nothing in the rules set by the teacher prohibiting this, and so they finished more quickly, earning a higher score.

    It seems reasonable to imagine that most humans would quickly figure out the stripy pattern, and some would eventually think to lay down multiple stripes at a time. Might a machine? Perhaps soon.

This is intended as a toy example. The rules of the bout are pretty simple, but I think they would discriminate somewhat between intelligent and not-so-intelligent players. The key point to note is that each player would play twice against every other player, once as the teacher and once as the student playing within the teacher’s rules. Perhaps some bouts are too hard, and some are too easy. But en masse, the rankings should discriminate quite finely between players, even between human players. The exact details of the scoring, especially how teachers are scored, and how teachers pre-specify their rules, are clearly going to be crucial. It will suffice for now to say that students should probably get points for satisfying the criterion of a bout quickly, and teachers should be rewarded for devising discriminative games, that is, games that only intelligent players can solve. I will defer further discussion of these topics until later.

Comparing the Turing Test and the Turing Tournament

In discussing the idea of an Inverted Turing Test (more below) Robert French states that:

“All variations of the original Turing Test, at least all of those of which I am currently aware, that attempt to make it more powerful, more subtle, or more sensitive can, in fact, be done within the framework of the original Turing Test.”

Is the same true of the Turing Tournament? I think the answer is both yes and no. In fact, you could think of a Turing Tournament as a kind of generalization of the Turing Test. That is, the original Turing Test could be treated (more or less) as a Turing Tournament where the domain of play is restricted to text, and the bouts terminate if the teachers/judges are satisfied they are talking to a human. It wouldn’t be quite the same, since here the players would double up as judges and the judges would double up as players. In other words, the machines would also themselves be making judgements about the humanness of both each other and the humans – an ‘Inverted Turing Test’. In its current formulation, where every player plays every other player as both teacher and student (i.e. judge and player), a Turing Tournament would really be a strange hybrid of both the Inverted and standard Turing Tests.

The idea of an Inverted Turing Test has been proposed before:

“Instead of evaluating a system’s ability to deceive people, we should test to see if a system ascribes intelligence to others in the same way that people do … by building a test that puts the system in the role of the observer … [A] system passes [this Inverted Turing Test] if it is itself unable to distinguish between two humans, or between a human and a machine that can pass the normal Turing Test, but which can discriminate between a human and a machine that can be told apart by a normal Turing Test with a human observer.”

French ingeniously showed that this Inverted Turing Test could be simulated within a standard (if somewhat convoluted) Turing Test. In contrast, it seems clear that an unrestricted Turing Tournament could not be fully simulated by a Turing Test because the potential domains of play are limitless. So although one might imagine instantiating the Go domain by communicating using grid references within a standard Turing test, it seems clear that there would be no way to run a domain of play such as a 3D virtual reality environment within a standard Turing Test using text alone. The principle advantage of widening the domain of play from text-only in this way is to allow players to pass some kinds of Turing Tournaments without speaking English, or any language at all. As a result, it seems reasonable to think of the Turing Tournament as (more or less) a superset of the Turing Test, or if the reader prefers, at least a redescription of it with unrestricted domains of play. I find this Ouroborean quality quite pleasing. [is it really ouroborean???] Either way, we can agree that most of the original Test’s merits and stringency should still be present in the Tournament version, depending on the way a particular Tournament’s domain of play and restrictions are set up. This does raise the important question of whether a Tournament victory would be as convincing a demonstration of intelligence as a victory in a standard Turing Test – I will return to this below.

French also shows that the Inverted Turing Test could be passed by a simple and mindless program that would take advantage of the very subcognitive demands that make the original test so parochial and difficult to pass. In short, the machine could have a pre-prepared list of questions that have been shown to weed out machines in the past, such as

“On a scale of 0 (completely implausible) to 10 (completely plausible), please rate:

  • ‘Flugblogs’ as a name Kellogg’s would give to a new breakfast cereal.
  • ‘Flugblogs’ as the name of a new computer company.
  • ‘Flugblogs’ as the name of big, air-filled bags worn on the feet and used to walk on water.
  • ‘Flugly’ as the name a student might give its favorite teddy bear.
  • ‘Flugly’ as the surname of a bank accountant in a W.C. Fields movie.
  • ‘Flugly’ as the surname of a glamorous female movie star.”

By pre-testing lots of humans and machines to figure out what kinds of things people say, and machines fail to say, a simple but well-prepared machine could draw up a ‘Human Subcognitive Profile’. By comparing this to the responses of players, it would be an extremely effective judge in an Inverted Turing Test. There are two reasons why this strategy would not work in a Turing Tournament:

a) In the above specification, none of the players know which domain they will be playing in until the competition begins officially (after which the designer is barred from tweaking his machine). As a result, it would be impossible for the designer to create Human Subcognitive Profiles for every possible domain that the machine might find itself playing in a Tournament.

This same effect could perhaps be wrought in a standard Test by restricting the domain of conversation, but not telling the players before the competition begins what that domain will be.

b) In order to be successful, players have to be good as both teachers and students. As mentioned above, this is akin to holding both a standard and Inverted Turing Test. Even if the domain was known in advance, and even if it was possible to draw up a Human Subcognitive Profile for that domain somehow, such a machine would be exposed as a student.

Lastly, French asks whether the standard Turing Test might be modified to forbid the kind of subcognitive questions that underly its cultural and species-specific parochialism. He concludes that the kinds of questions that probe “intelligence in general … are the very questions that will allow us, unfailingly, to unmask the computer”.

He may well be right. However, it may be that moving out of the text domain will dramatically reduce the scope of possible subcognitive shibboleths that human teachers could employ. Having said that, there will still be many possibilities for rules that would place human student-players at a big advantage. For instance, in the case of the Go domain, a cunning human teacher could choose to play by the rules of Connect4, which other humans might be much quicker to fathom. In the case of some kind of Photoshop canvas domain, humans could spell out words cursively, outwitting even the most seasoned OCR software. If there’s any kind of free-text involved, any of the subcognitive tricks designed for the standard Turing Test might be employed. In the case of a 3D virtual environment, human student-players will have a huge edge, though perhaps 2D or high-D worlds would level the playing field. One might hope that imaginative specification of domains could minimize such advantages, and that after 10 years of such competitions, machine programmers will almost certainly know to build in pre-loaded expert knowledge of Connect4, for instance, but the problem will clearly still remain.

[N.B. In order to ensure that the scales aren’t conversely weighted too heavily against human players, it seems reasonable to allow all human players the use of a laptop throughout the Tournament.]

Maybe instead we should accept the possibility of subcognitive shibboleths, and embrace their utility instead as a means of cataloguing different kinds of conceptual schemes. There is a presumption inherent in the standard Turing Test that smartness can be measured on a one-dimensional continuum ranging from rocks to rocket scientists. In the case of the aliens that have travelled 4 million light years in a space ship built out of genetically-engineered quantum nanobits and powered by fermented mango juice, we could be pretty sure they’re intelligent, even if they were never to get the hang of English. It’s just that their conceptual schemes are different. In this case, we may find that there are cases where they think more like machines than like humans. Or possibly more like dolphins, or African grey parrots, or white mice or marmosets. If we’re able to set up a domain in a Tournament that everyone can play in, then we can expect that human student-players may not necessarily come out on top in all respects, even within the animal kingdom. We will return to this idea when we discuss Turing Tournaments between groups of individuals.

Devising new rules, and non-linguistic competitors

Besides extending the domain of play beyond text, the principle innovation of the Turing Tournament is in casting every player as both student and teacher.

It is clear enough what is required of the student player. When the bout begins, they have some idea of the kinds of interactions, puzzles and patterns that the domain presents. By interacting with the teacher player, they have to somehow fathom what the aim (i.e. terminating criterion) of the current bout is, and attempt to satisfy that. It might involve placing pieces on the board in some complex pattern, learning the structure of a maze, guessing at the next number in a sequence or optimizing some noisy function. Depending on the tournament, they may or may not be given feedback after each move:

  • If they’re given a running score, they can attempt to learn how to maximise that reward.
  • If no reward is given, but the teacher corrects incorrect moves, then the learning by imitation can be seen as a kind of supervised mapping or reconstructive learning problem.
  • There may even be cases where no feedback is given whatsoever, such as when the bout requires the student to guess the next number in some sequence.

It is the teacher’s job to come up with new and inventive rules for bouts that challenge the student-players, and also to perhaps lead the student in the right direction. For the Tournament to work as intended, teachers should be intending to come up with the most discriminative bout rules they can.

Getting the incentive structure for the teachers right is therefore key. I expect that early scoring structures will contain loopholes that ingenious machine designers can exploit, but that over time, scoring structures that serve their purpose in a robust way will emerge. If our goal is to discriminate humans from machines, then this simple scoring system may work:

  • If the student ‘wins’ (i.e. satisfies the terminating criterion) a bout, whether human or machine, then they get a point, otherwise they get nothing.
  • If a human student wins a bout, then the teacher gets a point, otherwise they get nothing
  • If a machine student wins a bout, then the teacher loses a point, otherwise they get nothing.
  • Total player score: the sum of the player’s scores as a student and their scores as a teacher

    [There may need to be some weighting/normalization if the number of human and machine players is unequal.]

In effect, we’re rewarding players that seem human, and can devise rules that discriminate whether other players are human. This Tournament setup is the combo standard/Inverse Turing Test described above, that would not differ all that wildly in principle from the standard Turing Test if played in a text domain. Such Tournaments would encourage the kinds of subcognitive or culturally-rooted human-parochialism that we’re trying to avoid.

Perhaps this more general scheme will work instead:

  • If the student wins, whether human or machine, then they get a point, otherwise they get nothing.
  • To calculate the student’s score: at the end of all the bouts, count the number of bouts that each player won as a student. Calculate the mean number of bouts won. For each student, subtract this mean value from the number of bouts they won. This will mean that a very average player will have zero student points, a good player will have a positive number of points, and a poor player will actually have negative student points.

    In other words:

    c_m = sigma_n^N( W_nm ) – sigma_n^N( sigma_m^N( W_nm )) / 2N


    c_m = the overall student score for player m

    N = the number of players

    W_nm = 1 if student m won in their bout with teacher n

    W_nm = 0 if student m lost in their bout with teacher n

  • To calculate the teacher’s score: if the student wins a bout, then add the student’s student score (which may be negative) to the teacher’s teacher score. If the student loses the bout, then the teacher gets nothing.

    In other words:

    p_n = sigma_m^N( W_nm * c_m )


    p_n = the overall teacher score for player n

    N = the number of players

    W_nm = 1 if student m won in their bout with teacher n

    W_nm = 0 if student m lost in their bout with teacher n

    [It may be that W_nm should be -1 if the student lost]

  • Total player score: the sum of the player’s teacher score and student score

    [There may need be some normalization to ensure that the teacher and student scores are weighted equally.]

What’s the point of all this complexity? If you’re a teacher, then you do best if you can design your rules such that only above-average players (whether human or machine) win in your bouts. There’s actually a penalty if you make your rules so easy that everyone can figure them out, and you’ll get zero points if no one can figure them out at all. When you’re a student, you want to be as smart as you can, and when you’re a teacher, you want to be as discriminative as you can. En masse, the community of competitors are striving to do as well as they can and to evaluate each other as well as they can.

Inventively devising rules to favour intelligent over non-intelligent participants requires sufficient representational power to understand, let alone manipulate, your own rules, a rich theory of mind, as well as a generative good taste. Consider a Tournament played in the simple domain consisting solely of letterstring analogy problems, where the student is faced with problems such as:

“I change abc into abd. Can you ‘do the same thing’ to ijk?”

or in non-linguistic terms:

abc —> abd; ijk —> ?

Reasonable responses include ijl, ijd, or even abd.

Let us imagine that a player as cunning as Douglas Hofstadter has devised the following problem:

abc —> abd; mrrjjj —> ?

Peer at this for a moment – you won’t appreciate that this is somewhat fiendish unless you try it for a while yourself. Any ideas?

There’s no obvious pattern to the letters chosen on the right hand side, so mrrkkk seems kind of lame, and abd always feels lame. Well, how about if you try this one first:

abc —> abd; abbccc —> ?

Though your first thought may have been abbddd, doesn’t abbdddd seem so much nicer? It’s as though the successorship sequence of letters needs to be reflected in the increasing length of the letter groups (to use the FARG’s terminology). Now, let us turn back to:

abc —> abd; mrrjjj —> ?

Doesn’t mrrjjjj seem like a nice, reasonable solution now? Would you have considered it so nice before the previous example. Probably almost as nice. Would you have thought of it on your own, without the previous example? Probably, given some head-scratching.

The point of this digression is to point out how an imaginative teacher can guide, plant ideas, manipulate, prime, coax and lead the student by example in such a way that an intelligent player would almost certainly get the right answer, but there are almost no extant machines that would stand a chance. Besides having the sheer representational flexibility to deal with even barebones analogies such as the one above, a really intelligent player would be using the first few turns to gauge the teacher, get a sense of what kinds of solutions are admissible, and would probably be relying on Gricean maxims wherever possible.

What if your alien doesn’t know anything about Gricean maxims? Or doesn’t understand concepts like tournaments, rules, intelligence, machines, scores or games? We’ve finished outlining how a Tournament might be run that might require less domain knowledge and linguistic ability than the standard Turing test. But one striking pragmatic problem remains, which becomes apteacher when we consider our newly-arrived green visitor. If the alien doesn’t speak English, how are we going to explain the idea of the Turing Tournament to him so that he can participate?

Following Minsky, I think that we will be able to converse with aliens to some degree, provided they are motivated to cooperate, because we’ll both think in similar ways in spite of our different origins. Every evolving intelligence operates within spatial and temporal constraints, suffers from a scarcity of resources (and presumably, competition), must develop symbols and rules, and must have thought about computation and machine learning in order to be able build spaceships. Perhaps notions of games, intelligence, scores and tournaments are only relevant in a society of individual entities that compete with each other for resources, and that maybe a hive mind or single monolithic being or other unimaginable entity might not need such concepts. In that case, you wouldn’t have any more luck using the standard Turing Test on such a being.

Will we have much more luck with machines? Not unless we start small. At the moment, the state of the art in artificial intelligence wouldn’t do very well in most of the domains we’ve discussed, and would struggle especially when trying to generate new rules of its own. Sadly, very few researchers have focused on generative heuristics to curiously discover things that are interesting simply for their own sake, such as Lenat‘s Automated Mathematician that sought interesting mathematical concepts. In order to stand a chance in a Turing Tournament, much work needs to be done on curiously discovering interesting things that could serve as the basis for a rule set in a new domain. Good, that is, discriminative, rule sets for a Turing Tournament bout might consist of a difficult but ultimately guessable sequence of numbers based on a funny arithmetical pattern, or the kind of letterstring analogy problem that elicits an ‘aha’. Better still, teacher players that can lead an intelligent student player down a suggestive road towards the terminating criterion through tutorial or warm-up sub-bout problems will be at a tremendous advantage, where half the problem for the student consists of figuring out what their goal is supposed to be.

But is it intelligent?

Let us recall Shieber’s pithy test for intelligence:

“Now, how do you tell if something is intelligent? You compare it with an entity postulated to be intelligent.”

We’ve replaced that with an intellectual obstacle course. As teachers devising rules for their bouts, we are effectively asking players to define their own micro-test of intelligence (since being able to do this is surely a sign of good taste?). They must then be able to convey the parameters of that test such that other intelligent student players can figure out how to pass it, perhaps by creating lead-up sub-bouts, internalizing what the student player is probably thinking and so guiding the student players’ intuitions in the right direction. Finally, as students, the players must demonstrate in turn that they can flexibly assimilate what their goal should be, and then be able to get to it.

So although we might imagine some narrow machines that could best humans in certain kinds of puzzles or computation, but it seems less likely that a brute force machine player would also do well on Bongard problems, letterstring analogies, or be able to devise ingenious, fun and discriminative rules for bouts. This new generative aspect is intended to tap into the kind of creative, generative, playful, inventive or aesthetic faculty that humans display, as well as the ability to form a rich internal model of the student player’s state of confusion, and guide them towards a solution. In this respect, it borrows the idea of a dialog or gentle interrogation from the original Test, but allows for the translation of that dialog to new domains.

Bringing this back to Turing’s original question, we can finally ask, ‘if a machine were to score higher than some of the humans in a Turing Tournament, would we definitely be willing to call it intelligent?’ The answer could depend on a few factors:

  • Let us assume that the Tournament is well-planned, that the human competitors are well-chosen, that no independent experts can find any scoring loopholes or weaknesses in the organization of the Tournament, and that the result is replicable. If any of these conditions are not met, we will not consider the Tournament to be well-run.
  • If the domain is too restrictive, then there may be a dearth of interesting rule sets that can be devised. In this case, a good player won’t do much better than a poor player, and this wouldn’t be an interesting result.
  • Even if the domain is a rich one, such as letterstring analogy problems, it could be that a highly specialized program like Copycat could outperform many humans. Unless the success is relatively domain-general, then you’ve shown what you probably knew already, i.e. the machine is exhibiting some domain-specific proto-intelligence.
  • At that point, we would probably want to analyze the machine’s performance. Did it do better as a teacher or student? Was it simply very good at certain kinds of problems? Was there some simple trick to its way of devising problems which, once exposed, would clue in future human players in a rerun of the Tournament?
  • Could it pass a standard Turing Test?

Let us imagine that a machine is designed which is a poor teacher player, but a good student player, particularly in a couple of abstract limited-interaction domains like letter strings, number sequences, Go boards and cryptography, but that it can’t pass the Turing Test. Is it intelligent? Somewhat? We’ve forfeited the no-frills and no-free-parameters yes/no answer that you get from a Turingf Test, but we now have a much richer set of data with which to try and place this machine in the space of all possible minds. We have a more finely-graded multi-dimensional scale. Our machines can bootstrap themselves by competing amongst themselves without human intervention – specialist teacher machines that are good at discovering generative heuristics can be used to train specialist student machines that are good at problem solving, and vice versa. So in forfeiting our neat yes/no answer, we’ve gained a great deal.

Perhaps most importantly for the field of AI, we can now attempt to scale the enormous subcognitive iceberg of the mind incrementally, using ever more complex Turing Tournaments as yardsticks. In time, perhaps this will lead back towards the Turing Test as the final summit.

see also: AlienIntelligenceLinks