Sanity checks as data sidekicks

Abe Gong asked for good examples of ‘data sidekicks‘.

I still haven’t got the hang of distilling complex thoughts into 140 characters, and so I was worried my reply might have been compressed into cryptic nonsense.

Here’s what I was trying to say:

Let’s say you’re trying to do a difficult classification on a dataset that has had a lot of preprocessing/transformation, like fMRI brain data. There are a million reasons why things could be going wrong.

(sorry, Tolstoy)

Things could be failing for meaningful reasons, e.g.:

  • the brain doesn’t work the way you think, so you’re analysing the wrong brain regions or representing things in a different way
  • there’s signal there but it’s represented at a finer-grained resolution than you can measure.

But the most likely explanation is that you screwed up your preprocessing (mis-imported the data, mis-aligned the labels, mixed up the X-Y-Z dimensions etc).

If you can’t classify someone staring at a blank screen vs a screen with something on it, it’s probably something like this, since visual input is pretty much the strongest and most wide-spread signal in the brain – your whole posterior cortex lights up in response to high-salience images (like faces and places).

In the time I spent writing this, Abe had already figured out what I meant :)

Two-level tagging

Have you ever had trouble deciding where to store a file on your hard disk? Or worse, had trouble finding it later?

When you store a file on your hard disk, you have to decide which folder to put it in. That folder can in turn live inside other folders. This results in a hierarchy, known in computer science as a *tree*.

The main problem with trees is that sometimes you want things to live in multiple places.

Tagging provides an alternate system. Tags are a lot like folders, except that things can belong to multiple tags. However, but the tags can’t themselves belong to anything. So you have just one level of organisation with no nesting.

The main problem with single-level tagging is that it’s too simple. We want to be able to use fine-grained categories (e.g. ‘lesser spotted greeb’) that themselves belong to higher-level categories (e.g. ‘greeb’, or even ‘bird’ or ‘animal’). But we said that tags can’t themselves belong to tags.

Described like this, perhaps the solution will seem obvious to you too. We want things to belong to multiple tags, and for those tags to sometimes belong to other tags.

I built this into Emacs Freex, my note-taking system.

For instance, I have tagged this blog post with ‘data structure’ and ‘blogme’. In turn ‘data structure’ is tagged with ‘computer science’ and ‘blogme’ is tagged with ‘writing’. So I can find this blog post later in various ways, including by intersecting ‘computer science’ and ‘writing’.

This gives you the best of both worlds: things belong to multiple categories, along with a hierarchy of categories.

Blogging with WordPress and Emacs

When it comes to tools, I am a hedgehog rather than a fox. I like to have a small number of tools, and to know them well.

I recently resolved to start writing again. But I decided that I needed to sharpen my pencils first.

I have plans on how publishing and sharing should work. Grand plans. Too grand, perhaps.

So for now, I wrote something simple for myself. Now I can type away, press buttons… publish.

If you like Emacs, Python and WordPress, this might be interesting to you too. If not, it certainly won’t be.

wordpress-python-emacs GitHub repository

Most of the work is being done by this great Python/Wordpress library. Thank you.

I wrote some simple Python scripts. One grabs all my existing blog posts. One looks through their titles, and checks them against the filename to see if this is a new post.

And then there’s a very simple Emacs function that calls them to save/publish the current text file.

I could add more things: deleting posts, or a proper workflow for moving from draft to published. Maybe later.

I wrote this post, then hit M-x wordpress-publish-this-file.

The iPhone apps my cold, dead hands would cling most rigidly to

Instapaper – combine this with the Instachrome extension, and whenever I see a webpage I want to read later, it’ll be waiting with me as I wait for a train
Light – it’s bright! No more torches. If you lived in Hanborough, you’d need this too.
Trainline – faster than my laptop and/or a speeding bullet for checking train times in the UK
PlainText – write notes on your laptop, have them appear on your phone instantly through Dropbox and vice versa. Oh, and Dropbox of course, too.
Dictionary.com – etymologies, pronunciations, the works
Remote – control Keynote presentations from your phone.
Glympse – let other people know where you are.
Skype – I can call Mia for free while walking the streets
iTrans Tube and Tube Status for planning London Underground journeys
Snaptell – red laser black magic. Point at a book, and have elves whisper about it to you.
Angry Birds – the most popular mobile game of all time
Spotify – all the music in the world on the go. Requires a Spotify subscription
Shazam – tells you the name of songs that are currently playing

Startup thinking – the people who have most influenced my thinking on startups.

Steve Blank and Eric Ries

Steve Blank¬†teaches entrepreneurship at the Haas Business School in Berkeley, but has a pretty serious pedigree as a tech entrepreneur himself. I’m ashamed to admit that I still haven’t read The 4 Steps to the Epiphany, but I’ve read most of what he’s posted online.

Eric Ries is a protege of Steve Blank’s, applies and develops many of the same ‘lean’ ideas, and focuses specifically on web startups.

See the links for both Steve Blank and Eric Ries here.

—-

Paul Graham

He’s a tour de force and a hero of mine. One of the early proponents of ‘release early, iterate often’.

http://www.paulgraham.com/start.html

http://www.foxbusiness.com/search-results/m/25897600/funding-tech-start-ups.htm

—-

Joel Spolsky

These are my favorite as they relate to HR:

http://www.joelonsoftware.com/articles/GuerrillaInterviewing3.html
http://www.joelonsoftware.com/navLinks/fog0000000262.html