Why has Google open sourced TensorFlow?

I was sitting in a sun-warmed pizza restaurant in London last week talking about deep learning libraries. Everyone had their favourites. I was betting on TensorFlow, the new kid in town released by Google in late 2015. In response, a Torch fan pointed out that Google may invest in building up TensorFlow internally, but there’s no reason for them to invest in the shared, external version.

This got me thinking – why has Google open sourced TensorFlow?

Naively, I usually assume that companies keep their most crown jewels proprietary while open sourcing the periphery. In other words, keep your secret sauce close to your chest – but share the stuff that’s more generic, since it builds brand and goodwill, others may contribute helpfully, and you’re not straightforwardly giving a leg-up to your direct competitors.

Google’s approach to open source has been a little more strategic than this. Look at a handful of their major open source projects – Android, Chromium, Angular, Go, Dart, V8, Wave, WebM. The motivations behind them are various:

  • Android, Angular, Chromium, V8, Wave, WebM – creating a new version of an existing technology (free, better engineered, or faster) to disrupt an incumbent, or increase usage and thus drive revenue for Google’s core businesses.
  • Go, Dart and the long tail of minor projects are peripheral to their goals and serve less direct strategic interest.

For TensorFlow to make sense and be worthy of long-term support from Google, it needs to fall in the former category.

It is indeed a new version of an existing technology – it’s free, it’s better engineered, though not yet faster.

So, is it intended to either disrupt an incumbent, or to increase usage and thus drive revenue for core Google businesses? I can only think of two possibilities:

  1. TensorFlow is intended to be a major strategic benefit for Android. Machine learning is going to power a wave of new mobile applications, and many of them need to run locally rather than as a client-server app, whether for efficiency, responsiveness or bandwidth reasons. If TensorFlow makes it easier to develop cross-platform, efficient mobile machine learning solutions for Android but not for iOS, that could give the Android app market a major boost.
  2. TensorFlow is intended to be a major strategic benefit for Google’s platform/hosting, and to disrupt AWS. Right now, it’s pretty difficult and expensive to set up a cloud GPU instance. TensorFlow opens up the possibility of a granularly-scalable approach to machine learning that allows us to finally ignore the nitty-gritty of CUDA installations, Python dependencies, and multiple GPUs. Just specify the size of network you want, and TensorFlow allocates and spreads it across hardware as needed. This is why TensorBoard was part of the original implementation, and why AWS support was an afterthought. “Pay by the parameter”. If I had to guess, I’d say this is the major reason for open sourcing TensorFlow.

I want something like the above to be true, because I want there to be a strategic reason for Google to invest in TensorFlow, and I want it to get easier and easier to develop interesting and complex deep learning apps.

Todo Zero

What if I suggested that you finish each day with nothing left on your todo list? This is the only rule of Todo Zero.

You might find yourself biting back some choice words. This sounds like unhelpful advice from someone with a much simpler life than yours.

Not so fast. Picture a world-class juggler with half-a-dozen balls in motion. How many balls do they have in their hands at once? None, one, or two. Never more than two. The remainder are in the air.

By analogy, work on just one or two things at a time. The remainder can be scheduled for some time in the future. In this way, it’s very possible to finish what’s currently on your list.

Otherwise, all of the competing priorities of a long list clamour for your attention. They clutter one another, making it impossible to focus. When you’re pulled in many directions, you’ll end up immobilized and demotivated.

At least that’s what has happened to me. My implicit solution was to procrastinate until panic seized me, and then enjoy its temporary clarity of focus.

So, here’s a recipe for Todo Zero that will take an hour or two to start with:

  • Go through your todo list and pull out anything that’s going to take less than 10 minutes.
  • Pick out the one or two jobs that you really want to tackle – these should be the most important or urgent things on your list. Break them down into pieces that you could tackle today if you really put your mind to it, and note them down.
  • Schedule everything else as future events in your calendar (I usually just assign them to a date without a time). Give yourself enough room before the deadline to finish them without rushing. Don’t be over-optimistic about how many or how quickly you can work through them.

So, that leaves you with quick tasks that take less than 10 minutes, along with the one or two most urgent/important jobs for today.

Marvel at your wonderfully shortened todo list. Look away, take a deep breath. Do not look at your email. Make a coffee. Feel a little calmer than you did, and enjoy it.

Now, let’s do the same for your email.

  • Find any emails that are going to take less than 10 minutes to reply to, and boomerang them for 2 hours’ time.
  • Pull out one or two emails that are urgent or important, and boomerang them for 1 hour’s time.
  • If you have the energy, boomerang each of your remaining emails for future times individually (tomorrow, a week away or a month away, depending on urgency). If you don’t have the energy, just boomerang them wholesale for tomorrow morning.

Stand up, and take a deep breath. Walk around for a few minutes, and make a cup of coffee. This is going really well.

  • By the time you get back, you should be staring at a short todo list and a pretty clear inbox. [If anything new has landed, or any have boomeranged back, send them away for an hour. We need a clear head]
  • Now, let’s dispatch the less-than-ten-minute odds & ends tasks. Do some of them, most of them, all of them, it doesn’t matter. Just a few, to get back a sense of momentum.
  • Your most urgent emails have boomeranged back. Deal with them.

Take a break.

At this point, you’re close to the point where you have a clean slate, and just your important tasks. You probably have some meetings and stuff. Have lunch. Refresh.

  • Now, it’s time to tackle those one or two important high-priority tasks-for-today.
  • Picture yourself at the end of the day, leaning back in your chair with your hands knitted behind your head, smugly. For that to happen, double down on those one or two most important things, and the rest can wait. You will feel great.
  • Don’t do anything else today. Don’t check your email if you can avoid it. Your goal is to boomerang away (by email or calendar) anything but them.

With any luck, you made progress on those one or two most important tasks.

Armed with this approach, you can triage your own life. You can choose to focus on the most urgent or important things first, and ignore the rest. They’ll shamble back when their time has come, and then you can dispatch them in turn.

P.S. There are a few tools that will help:

  • Google Calendar – add a new ‘Todo’ calendar, whose notifications are set by default to email you at the time of the event.
  • Any simple todo list app or text editor of your choosing. It doesn’t matter.

P.P.S. One final note. I can’t juggle two balls, let alone six. So take that into account, seasoned with a pinch of salt, in reading this.

P.P.P.S. Of course, there is nothing that’s original here. It’s a death-metal-mashup of Inbox Zero and GTD. It’s not always feasible to work like this. If you don’t procrastinate, you probably don’t need it. Etc.

Sanity checks as data sidekicks

Abe Gong asked for good examples of ‘data sidekicks‘.

I still haven’t got the hang of distilling complex thoughts into 140 characters, and so I was worried my reply might have been compressed into cryptic nonsense.

Here’s what I was trying to say:

Let’s say you’re trying to do a difficult classification on a dataset that has had a lot of preprocessing/transformation, like fMRI brain data. There are a million reasons why things could be going wrong.

(sorry, Tolstoy)

Things could be failing for meaningful reasons, e.g.:

  • the brain doesn’t work the way you think, so you’re analysing the wrong brain regions or representing things in a different way
  • there’s signal there but it’s represented at a finer-grained resolution than you can measure.

But the most likely explanation is that you screwed up your preprocessing (mis-imported the data, mis-aligned the labels, mixed up the X-Y-Z dimensions etc).

If you can’t classify someone staring at a blank screen vs a screen with something on it, it’s probably something like this, since visual input is pretty much the strongest and most wide-spread signal in the brain – your whole posterior cortex lights up in response to high-salience images (like faces and places).

In the time I spent writing this, Abe had already figured out what I meant 🙂

Two-level tagging

Have you ever had trouble deciding where to store a file on your hard disk? Or worse, had trouble finding it later?

When you store a file on your hard disk, you have to decide which folder to put it in. That folder can in turn live inside other folders. This results in a hierarchy, known in computer science as a *tree*.

The main problem with trees is that sometimes you want things to live in multiple places.

Tagging provides an alternate system. Tags are a lot like folders, except that things can belong to multiple tags. However, but the tags can’t themselves belong to anything. So you have just one level of organisation with no nesting.

The main problem with single-level tagging is that it’s too simple. We want to be able to use fine-grained categories (e.g. ‘lesser spotted greeb’) that themselves belong to higher-level categories (e.g. ‘greeb’, or even ‘bird’ or ‘animal’). But we said that tags can’t themselves belong to tags.

Described like this, perhaps the solution will seem obvious to you too. We want things to belong to multiple tags, and for those tags to sometimes belong to other tags.

I built this into Emacs Freex, my note-taking system.

For instance, I have tagged this blog post with ‘data structure’ and ‘blog’. In turn ‘data structure’ is tagged with ‘computer science’ and ‘blog’ is tagged with ‘writing’. So I can find this blog post later in various ways, including by intersecting ‘computer science’ and ‘writing’.

This gives you the best of both worlds: things belong to multiple categories, along with a hierarchy of categories.

Blogging with WordPress and Emacs

When it comes to tools, I am a hedgehog rather than a fox. I like to have a small number of tools, and to know them well.

I recently resolved to start writing again. But I decided that I needed to sharpen my pencils first.

I have plans on how publishing and sharing should work. Grand plans. Too grand, perhaps.

So for now, I wrote something simple for myself. Now I can type away, press buttons… publish.

If you like Emacs, Python and WordPress, this might be interesting to you too. If not, it certainly won’t be.

wordpress-python-emacs GitHub repository

Most of the work is being done by this great Python/Wordpress library. Thank you.

I wrote some simple Python scripts. One grabs all my existing blog posts. One looks through their titles, and checks them against the filename to see if this is a new post.

And then there’s a very simple Emacs function that calls them to save/publish the current text file.

I could add more things: deleting posts, or a proper workflow for moving from draft to published. Maybe later.

I wrote this post, then hit M-x wordpress-publish-this-file.

Breaking the seal

I’ve wanted to try daily writing for an impossibly long time, but the first words didn’t want to be dragged out.

In my case, they were unstoppered by a day in London. I pinballed from train platforms to coffee shops, oblivious bustle all around me, far away from the furrowed-browed finger-pecking at Memrise HQ. That context-shift provided a firebreak from the quotidian, and I was finally in the mood to mentally roll up my sleeves and rub my hands together.

Writing’s like peeing – once you break the seal, the words just spill forth all evening.

I was able to decant a dozen half-thoughts that I queued up like toy soldiers, to be birthed one by one over the following week. It’s now rather fun to receive a blog post from my previous self every day.

Entangling the ground and cloud

The cloud and the ground

Most of you will have an idea of what I mean by ‘living in the cloud’. The cloud is the internet, the web, ssh servers, http protocols and text boxes in browsers, accessible from anywhere, hosted on a server or many servers somewhere. Wikis, blogs, and bookmarking services like del.icio.us are the cloud. We travel to the cloud on a flying TCP/IP rug, whisking us from anywhere in the world to wherever it is the cloud is. The cloud is available to us everywhere, as long as we have internet access. Its non-local ubiquity is its virtue.

In contrast, the ground is our hard disk, .doc files and desktop applications, our personal computer, local, physical, heavy, awkward, something we carry around with us. Distance means something on the ground, because if you don’t have your laptop with you, you can’t get to your ground-home.

Perhaps I’m being crystal clear about what I mean by ‘cloud’ and ‘ground’. In case I’m not, we can boil the distinction down to this: the cloud is everything that we need the internet to access. The ground is everything that lives on our personal computer. The tradeoffs are obvious. The cloud is always there as long as you have internet. The ground is always there if you’re willing to tote around your computer.

The filesystem as carpet

We talked earlier of the network whisking us to the cloud like a many-threaded flying rug, a wizardly tapestry of protocols. But if the network is a rug that whooshes us from one location to the other, then perhaps we can think of the filesystem as akin to an everyday carpet – the flat, underfoot, non-vehicular kind that sits there as you wander about, so permanent as to remain quite unnoticed. On the ground, all of our files rest on the homely, woven fundament of the filesystem carpet. If you want to edit a document of any kind, you’re editing a file on the fileystem. This is what gives the ground its feeling of stable proximity. You can’t fall off the ground.

But what if, in the future, the faded paisley pattern of your local filesystem carpet hid a kind of quantum entanglement, where “the quantum states of two or more objects have to be described with reference to each other, even though the individual objects may be spatially separated”? What if the carpet in our home on the ground and the carpet in our home in the cloud were entangled? A long-standing, two-way, lightning strike beanstalk linking cloud to ground, and ground to cloud. I would be able to pootle about on the ground, knowing that the cloud is synchronized. So that when I leave my laptop home at home, the cloud has mirrored everything I need. I want my toothbrush to always be there, waiting for me, in my home and my home-away-from-home.

Every file appearing to rest so solidly on the filesystem-carpet would also sit on the cloud-carpet above. And vice versa. Every time you edit a file on the filesystem-carpet, the changes are propagated automatically to the simulacrum on the cloud-carpet, and vice versa. You think you’re editing C:\blog\my_new_post.doc, but when you save, http://gregdetre.blogspot.com/ updates. You just tagged a page with ‘superduper’ on http://del.icio.us/gdetre, and a new .xml file appeared in ~/del.icio.us/superduper/ at the same time. In this hopeful future, where our home on the ground changes in lockstep with our home in the cloud, the question of whether we live in one or the other becomes meaningless to the user – when you light one candle from another, which one carries the original flame? Equally both. Either. When I edit a document, on ground or cloud, I want that document to exist equally, indistinguishably on cloud and ground.

If and when the ground- and cloud-carpets do become temporarily untangled, while on a plane or because of a wireless malfunction, then any changes made in either cloud or ground are stored up to be replicated at the very first opportunity, bi-directionally, automatically, and in the background.

What would it be like to use an entangled filesystem?

Everyone will have some space of their own in the cloud. When you first install your operating system, it will ask you for your cloud-home Open ID address. When you post a comment on Slashdot, the sign-on process causes a new directory to appear on your hard disk. All of a sudden, ~/slashdot.org/article/492308/comments/2309109.txt appears on
the ground.

Every directory on your hard disk will be mounted with a cloud-address that automatically provides a ground-cloud mapping schema for that directory. All websites will provide a unique URL for every state or view into the system – it’ll be the job of the ground-cloud mapping schema to map those URLs to files and directories, specify permissions, decide which file format your content will appear as, define the effect filesystem actions have, etc. So perhaps editing 2309109.txt would change the comment on the actual Slashdot page… unless the ground-mapping schema says that comments can’t be modified once written, in which case 2309109.txt will appear as read-only once it has been moved from the ‘drafts’ directory where you created it.

Right now, the browser is the main gate through which we pass to get into the cloud, but it needn’t be. No one wants to have to type a whole new wikipedia entry in a poxy text box. Much nicer to fire up MS Word or emacs, and work on c:\wikipedia\Carpets.wiki in comfort (effortlessly mounted as a wikipedia service).

We’re so close already. On the one hand, tools like Subversion and Unison handle synchronizing pretty well. On the other hand, we can mount remote filesystems over SSH with Fuse SSHFS or Emacs Tramp, so that they almost feel local. So there are tools that sync, and there are tools that make the remote feel local, but there are no tools that do both: automatically sync the remote and make it feel local … running in the background in perfect imitation of a filesystem.

Tim Bray sounds a stirring clarion call for ‘a “Publish” button on everything’. That’s a solution at the level of the interface. I’d much rather we build publishing into the very heart of the system. I don’t need a “Publish” button. Just let me save my files to a ground-cloud entangled filesystem, and have them simultaneously published for me.

It has to be easy, and worth it, for you to add tags

Whoever adopted the idea that “there’s a place for everything, and everything in its place” when it came to organizing files and ideas on a computer suffered from a failure of imagination. Or maybe they were just over-wedded to the desktop and filing cabinet metaphors. Fortunately, the idea of ‘tagging’ (or ‘labels’ in Google’s parlance) blew that whole banal tidiness away. In short, tagging lets you assign things to multiple categories, or if you prefer, put things in multiple places. Rashmi describes this well – tagging is popular because there’s a lower cognitive cost when you can put things in multiple categories, rather than having to decide on just one.

We’ve only just started to scratch the surface of how categorization schemes could work. I’m going to propose a few ways in which things might grow from here, focusing on the restricted case where you’re tagging your own files privately, ignoring all the interesting goodness that happens when those tags are available to others, delicious-style.

N.B. I’m going to use the term ‘category’ rather than ‘tag’, since it’s easier to think of things belonging to categories than being labelled with a tag. The key notion is that things can belong to multiple categories simultaneously.

The more tags the better

Jon Udell has a great post on building up a taxonomy of categories by hand, starting with a smallish corpus of documents, and just letting the taxonomy emerge, combined with a little judicious weeding. The dataset he has in mind is pretty small, and so he’s aiming for 15-40 categories. The kinds of datasets I have in mind are much larger.

For instance, I have a few thousand text files with notes on topics ranging from Ubuntu troubleshooting to the symptoms of schizophrenia to my travel arrangements for the summer. I could maybe try and shoe-horn things into a few tens of categories, with each category holding many items, and each item belonging to maybe one or two categories. But I very quickly found this to be unsatisfying. We want to be able to differentiate things more finely than that. For instance, how would I categorize a document containing hotel bookings in Florence last summer for the HBM conference? Just by ‘travel’? Or also ‘Florence’, ‘conference’, ‘hotel’, ‘HBM’, and ‘2007’. Remember the argument about lower cognitive cost though – it’s much less effort just to include all those categories. If I do that, I’ll end up with many hundreds or even thousands of categories, some of which will have tens or hundreds of members and some of which might only have one or two members. I think one might raise two main objections to this approach:

  • Can you really be bothered to add a bunch of categories each time you write something?
  • How do you begin to find anything now? Sometimes filtering by a category doesn’t help because it returns way too many members, and sometimes it doesn’t help because it returns hardly any. Where’s Goldilocks when you need her?

I’ll address these in turn.

Can you be bothered to add a bunch of categories each time?

People are lazy. Any system that requires people to be assiduous book-keepers while they’re writing is doomed. Dave Winer talks about how he should be categorizing all his posts, and yet he doesn’t do it – and this makes him feel guilty. He knows that he won’t be able to trust the categories to find that thing later. The value of the whole system has dropped. Squirrels wouldn’t go to the effort of hoarding nuts for the winter if they knew that they wouldn’t remember where those nuts are when they need them. So what’s the point of hoarding nuts any more? All of a sudden, the system has broken down. We need to find a way to make the system less brittle.

Let’s look at Dave Winer’s guilty confession a little more closely:

“I have a very easy category routing system built-in to my blogging software. To route an item to a category, I just right-click and choose a category from a hierarchy of menus. I can’t imagine that it could be easier. Yet I don’t do it.”

If you ask me, that’s not easy enough. Navigating hierarchical menus with a mouse is slow and distracting. Blogger does it right – there’s a ‘labels’ text box that you can tab to, into which you can write a comma-delimited list of tags. As you type, it auto-suggests – pressing ‘return’ fills in the rest of the tag and puts a comma and space after for you. So that’s step 1.

But it should be even easier. What should happen is that the machine should automatically throw up a list of tags that it thinks might be appropriate for this post. It should put the ones it’s most confident about to the left, and less confident ones to the right, with the cursor positioned at the end to make it easy for the user to delete false positives and add new categories it missed. And if you’re feeling lazy, then you can just accept the machine’s suggestions without glancing at them. The cost of a false positive is low, so it’ll deliberately suggest too many. This brings us neatly to our second concern.

But then how do you find anything?

So now every document belongs to a bajillion categories, none of which is particularly useful on its own. But a conjunction of categories should narrow things down nicely. If I’m trying to find that hotel booking in Florence, I don’t have to worry about remembering whether it’s tagged with ‘travel’, ‘hotel’, ‘Florence’, ‘2007’ or ‘HBM conference’, since it’s tagged with all of them. So I’ll try filtering by the conjunction of ‘hotel’+’Florence’+’2007’ and that’ll probably winnow things down sufficiently for me to pick the file out manually (see also: make tags not trees). .

But maybe we never made a ‘Florence’ category. It seems like such a natural cue to use now, but at the time, ‘Florence’ didn’t spring to mind as a salient category, despite our liberal categorizing policy. If the system auto-completes in a handy way, we’d already know this, and our fingers would already be backspacing and trying ‘Italy’ or ‘HBM conference’. There are many points of failure, but there are also many points of entry. If we make it easy enough to cue for conjunctions of categories, then there’s a very low cognitive cost to having to backtrack once or twice, since our brain effortlessly supplies us with so many possible cues to use.

We could make things even less brittle in lots and lots of ways. Perhaps the system notices that only one item in the whole database is tagged with ‘Florence’, so it’s probably too restrictive a category. No matter. It could just ignore ‘Florence’, or suggest that we omit ‘Florence’ from our search. Better still, and less intrusively, it could now grep through all the files that match one or more of the tags to see if ‘Florence’ appears in the text, and automatically suggest any matches as partial matches.


I keep coming back to the same feeling – for the most part, people don’t write notes because they don’t think they’ll be able to find those notes later when they need them – so why bother writing the notes in the first place?

All of these suggestions are geared towards:

  • Reducing the cognitive cost at both writing and retrieval. If it’s less effort, you’ll feel less lazy about adding category metadata.
  • Making the system less brittle, so that if you were lazy about your category metadata, you still have a good chance of finding things later. This is the key to ensuring that you don’t end up losing faith and give up on writing things down in a structured way altogether.

Taken together, I hope that it will become easier to categorize your notes in a way that helps you find them later, which is going to make you much more likely to write them down in the first place.

Collaborative filtering and how it’s going to help us consume

In the future, we will routinely employ some product that will probably be called Microsoft MyLife (1) to manage our reading, news, entertainment and shopping. What will it do? Let’s start with the present and build forwards. For my money, amazon.com is the best site in the business. It takes the only shopping activity I enjoy, book-shopping, and manages to make it even better online.

Shopping with Amazon is so pleasurable and fruitful because it first leads me by the hand towards things that I’m genuinely interested in and then provides me with the 3rd-party reviews and ratings feedback that I always find myself hungering for when buying something. It’s like having Virgil for a librarian. It’s shopping by democracy, where your candidate always wins. But it’s still pretty limited. I want to be able to head to the recommendations page and choose to be recommended books of a certain kind only, rather than having my interests in neuroscience, programming and sci-fi lumped indiscriminately together. I may want it to weight recent purchases heavily, or to only find books by authors I’ve never read. But the possibilities for tinkering with the recommendations parameters are sadly limited.

I dream of a ‘How lucky do you feel, punk?’ slider, that ranges from conservative to adventurous. Perhaps today I’m tired and I want something I’m certain I’ll like. If I’ve bought the first 35 of David Gemmell’s Waylander books, Amazon can be pretty sure I’ll like the 36th, since there’s no way to tell them apart. But maybe tomorrow I’ll be high on redbull and tractor fluid, and I’ll want something new and unexpected. Perhaps initial impressions indicate that I’ll like some new author who’s making waves, or perhaps Amazon’s crazy collaborative filtering algorithm thinks that David Foster Wallace + David Sedaris = Tom Robbins, and recommends something to me out of left field accordingly. After all, I want help choosing a book, but part of the reason I like browsing for fiction arranged alphabetically is that you never know what’ s going to catch your eye. I can choose on a given day whether to browse only for names I know, or to open myself up for something fresh and unexpected.

Secondly, I want to be able to ask for recommendations for someone else. Let’s say it’s my dad’s birthday. I want to ask Amazon, ‘What would I like if I was a middle-aged man who likes John Le Carre, Tom Peters and Hoagie Carmichael?’. I want to be able to create a persona for my dad, and for it to make some guesses. Even if they’re terrible, maybe they’ll give me ideas, or maybe I just need to give the system a little more information. At this point, things could get interesting. It would be pretty easy to integrate this with my dad’s actual Amazon account, if he chooses to let me, so that it could take his purchasing history and wishlist into account as extra information. It would know what books he’s bought recently, and so might remind me of some interests of his that i’ve forgotten, or of some burgeoning interests that I can sneakily anticipate. And i’m prepared to bet that it could do this with just a broad sprinkling of sample purchases to guide things. You can think of the adventurousness slider bar mentioned above as titrating from Marks & Spencer pullovers to gift vouchers at Stringfellows. The point is that i want to be able to tap my guide on the shoulder, shake my head, and point in a different direction. ‘Yes’ to the Herend china, but ‘no’ to the Chinese Hentai. The current collaborative filtering algorithm that they use to make recommendations works brilliantly, but is amazingly restrictive in the way that you can tweak it.

Let’s say that Jeff Bezos reads this, slaps his forehead at the obvious genius of it all, and immediately engages a few of his platoons of elite Bonobo chimps trained from birth in arcane RDBMS lore to implement all of this. What next?

It knows what books I like. Why stop at books? Amazon sells everything except the kitchen sink. It probably does, in fact, sell legions of kitchen sinks too. But let’s stick to books, music and films for now. It seems obvious to me that what I read, what I listen to, and what I watch are going to be predictive of each other. In fact, the broader the information the system knows about you, the better I would imagine it could triangulate on what you like and generalize to useful recommendations. It should be relatively effortless for Amazon to generalize from books to music to films, or vice versa, and I’d be astonished if they weren’t already doing that. It’s not so clear how your furniture purchases might be dictated by your reading habits, but it’s not ridiculous either to think that a young 20-something male with money to burn who likes Friends might very easily be persuaded to buy a Lay-Z-Boy comfy chair (as featured on the series) if a few DVDs from the series (that he doesn’t have) get bundled free.

Walmart are starting to use this kind of data-mining in all kinds of ingenious, insidious ways with their product placement, but I’m talking narrowcast, baby. I’m talking about a one-time offer for you and you alone, brought to you direct by the system. I don’t really care all that much about Amazon knowing all this about me as long as they promise promise promise not to sell it, and as long as they continue to help me buy great shit cheaply without actually having to shop for it.

So they can tell me what DVDs and furniture to buy based on what I read. What if Amazon bought Ticketmaster.com tomorrow? Then, they could send me an email telling me that there’s a crazy new concert/play/demonstration/sewing circle next week, and would I like tickets? It knows that I can’t tell richly-developed fictional characters from a rotting horse’s arse, and that I like Dan Brown and art, so it cobbles together a deal with lastminute.com to send me to Paris, Rome, London and Roslyn at highly discounted rates.

How does it know all this? Because other people who like the things I like – they liked that. Sure, I’m being shepherded, but if I could have my own personal shepherd who keeps pointing out great unsigned bands in intimate venues, movies from Chile that don’t have Stephen Segal in them, and books that make me cry, then sign me up to be a sheep.

It’s pretty easy to see where this is going. TV’s going the way of the dodo, and even Tivo’s a bit tovo. I don’t want anyone to ever tell me that I have to watch the West Wing one episode at a time, once a god-damned-week. I want to buy 50 TV meals and watch them back-to-back without sleep. And there probably aren’t that many people quite like me, but there are a few, and that’s exactly what they like to do, so it shouldn’t be too hard for my collaborative augmentation shepherd to have my TV meals frisbeed through my open window at regular intervals by a supermarket delivery man.

How far might the system be able to generalize across domains for a given person? If it knows about my book, music and film tastes, could it start to guess what kinds of plays I would like, or magazines I would read? Pretty soon, it could start recommending clothes and events and articles.

If you start to map individuals to their locations and movements, then you could start to make recommendations about where to shop or visit. What could be more useful than knowing where my dad goes to shop, if I’m trying to buy him a birthday present? Actually, I can think of one thing more useful than that – knowing where people looking for presents for their dad went when they went shopping… It could plan out routes, take me to little one-of-a-kind shops tucked away, either because I tell it about them, by keeping track of my credit record, following my movements with something like GPS.

Eventually, you could see how this could improve, or invade, every aspect of our life. All of the information you consume would be customized to your tastes, or if you prefer sometimes, to someone else’s tastes. It seems critical though that you’d always be able to tweak the knobs when you’re feeling adventurous, because it would be so easy for us to habitually tread the same well-worn paths, hearing only the opinions that we’ve trained the system we want to hear.


(1) The name’s so catchy, trite, and alarmingly intrusive-sounding that I couldn’t pass it up. There will probably be an open source version called GNU Memacs.

[Update: I think they already have a MyLifeBits project that focuses on collecting all the data amassed over your life together, but that’s not really what’s being discussed here. That’s about retrieving information. This is about proactively suggesting new stuff from the cloud.]