Sanity checks as data sidekicks

Abe Gong asked for good examples of ‘data sidekicks‘.

I still haven’t got the hang of distilling complex thoughts into 140 characters, and so I was worried my reply might have been compressed into cryptic nonsense.

Here’s what I was trying to say:

Let’s say you’re trying to do a difficult classification on a dataset that has had a lot of preprocessing/transformation, like fMRI brain data. There are a million reasons why things could be going wrong.

(sorry, Tolstoy)

Things could be failing for meaningful reasons, e.g.:

  • the brain doesn’t work the way you think, so you’re analysing the wrong brain regions or representing things in a different way
  • there’s signal there but it’s represented at a finer-grained resolution than you can measure.

But the most likely explanation is that you screwed up your preprocessing (mis-imported the data, mis-aligned the labels, mixed up the X-Y-Z dimensions etc).

If you can’t classify someone staring at a blank screen vs a screen with something on it, it’s probably something like this, since visual input is pretty much the strongest and most wide-spread signal in the brain – your whole posterior cortex lights up in response to high-salience images (like faces and places).

In the time I spent writing this, Abe had already figured out what I meant :)

Two-level tagging

Have you ever had trouble deciding where to store a file on your hard disk? Or worse, had trouble finding it later?

When you store a file on your hard disk, you have to decide which folder to put it in. That folder can in turn live inside other folders. This results in a hierarchy, known in computer science as a *tree*.

The main problem with trees is that sometimes you want things to live in multiple places.

Tagging provides an alternate system. Tags are a lot like folders, except that things can belong to multiple tags. However, but the tags can’t themselves belong to anything. So you have just one level of organisation with no nesting.

The main problem with single-level tagging is that it’s too simple. We want to be able to use fine-grained categories (e.g. ‘lesser spotted greeb’) that themselves belong to higher-level categories (e.g. ‘greeb’, or even ‘bird’ or ‘animal’). But we said that tags can’t themselves belong to tags.

Described like this, perhaps the solution will seem obvious to you too. We want things to belong to multiple tags, and for those tags to sometimes belong to other tags.

I built this into Emacs Freex, my note-taking system.

For instance, I have tagged this blog post with ‘data structure’ and ‘blogme’. In turn ‘data structure’ is tagged with ‘computer science’ and ‘blogme’ is tagged with ‘writing’. So I can find this blog post later in various ways, including by intersecting ‘computer science’ and ‘writing’.

This gives you the best of both worlds: things belong to multiple categories, along with a hierarchy of categories.

Blogging with WordPress and Emacs

When it comes to tools, I am a hedgehog rather than a fox. I like to have a small number of tools, and to know them well.

I recently resolved to start writing again. But I decided that I needed to sharpen my pencils first.

I have plans on how publishing and sharing should work. Grand plans. Too grand, perhaps.

So for now, I wrote something simple for myself. Now I can type away, press buttons… publish.

If you like Emacs, Python and WordPress, this might be interesting to you too. If not, it certainly won’t be.

wordpress-python-emacs GitHub repository

Most of the work is being done by this great Python/Wordpress library. Thank you.

I wrote some simple Python scripts. One grabs all my existing blog posts. One looks through their titles, and checks them against the filename to see if this is a new post.

And then there’s a very simple Emacs function that calls them to save/publish the current text file.

I could add more things: deleting posts, or a proper workflow for moving from draft to published. Maybe later.

I wrote this post, then hit M-x wordpress-publish-this-file.

Setting up your own domain name and website

Someone asked me recently about getting their own domain name and setting up a website. I’m not very good at this stuff, but I have been through it once or twice, so I thought I’d at least offer this up in case it’s useful.

Let’s say you want to buy, and set it up as a series of static informational pages about Example Business Inc, along with email addresses. There are (at least) 3 ways you can go:

1) the standard method

– Grab the domain from GoDaddy (or any other domain name registrar – they all do basically the same thing). It’ll cost you $10-20 for a year or two

– Then you need to find a place to host your site (e.g. Rackspace, Dreamhost). You’d spend c. $10/month to rent space on a server, point your new domain name to the server’s IP address, write and upload some html and images, and away you go.

– You then need to set up email addresses. If it’s GoDaddy, I think you’ll be able to set it up to forward your email to an existing account without too much trouble.

– This is what i had to do with Memrise because I wanted control over everything. Honestly, it was much much more complicated than i had anticipated to figure it all out.

option 2) use Google

– Use Google to register your domain name

– I think that’ll automatically set you up with Google Apps (custom Gmail, Calendar, Sites, Blogger, Docs etc.) for free.

– Then you can set up the design and content of the pages of your website with Google Sites.

– So then you’d all use a custom Gmail interface to check your address, and have access to, etc. I’m a big fan of Google Apps.

option 3) Weebly (or some equivalent competitor)

– Besides Google, there are a variety of companies that help build a site. I’ve heard good things about Weebly, but haven’t closely investigated it for myself.

– Much like Google Sites, it looks like they’ll do most of what you’d want: help with design templates, deal with the hosting, potentially help with domain names, and a bunch of other stuff. Nice!


– As long as your needs are simple, I would consider the Google/Weebly approach, since I think it’ll be the most straightforward.

– Down the line, if you decide that you want to build something more complicated and interactive, you can always hire a programmer and switch from Google/Weebly to your own hosting set up.

– If you have someone to help who enjoys techie stuff or has set up their own site before, then setting things up with GoDaddy and your own hosting will probably go smoothly. But otherwise, a company like Google or Weebly that’ll do 90% of the work for you, so you can focus on building a great site :)

When I am famous, I will decline interviews

Reading the 5-page staged and glossy magazine interview in a hotel room with a famous actor has always filled me with a peculiar kind of existential dread. There’s something a little horrifying about an hour of conversation in cold type, bereft of the intonation, expression, context and rapport that make anything one says out loud bearable. And at the end of it all, to be distilled, distorted, interpreted and weighed by the pen of a stranger… Who could have the strength of character to read about but not become their own caricature?

In contrast, the last page of the Sunday Times magazine features ‘a life in the day of’ a happy array of personalities and professions. I like the concreteness of a single day as a window into someone else’s micro challenges and achievements. I realize that these days are probably fictionalized composites – but fiction makes for a sweet, concentrated and memorable pill. And at the end of it, there is no distillation, no weighing – just the reality of a daily rhythm.

When I am famous, I will decline interviews.

P.S. That said, I still remember being stopped in my tracks when a fashion photographer relative asked me sweetly ‘what did you today?’ in the midst of my PhD. My day had consisted of:

  • 2 hours debugging a misplaced comma
  • so that I could finish the 3-day long project of rearchitecting my non-parametric statistics to work across-subjects
  • in order to get a better sense of whether results from the latest in a long line of experiments were actually better than chance
  • so that we could tell whether reminding people and distracting them at the same time was causing them to forget
  • to test our computational theory that half-remembering a memory actually weakens it
  • which would have deep implications for our understanding how the brain learns and self-organizes

But really, I’d been comma-hunting, and it seemed hard to fit that into a the kind of response usually expected from ‘what did you do today?’.

Eroding our minds

I said that I thought “there’s something irresponsible about making money from advertising”.

Matt Weber was right to point out that although people hate the idea of targeted ads, they can be genuinely useful. Though I don’t think a very large proportion of the available advertising real estate offers the possibility for really great targeting.

[Of course, good advertising can be an art form in itself. And by funding most of our software and reading materials, advertising adds tremendous value to our lives.]

But even on the internet, most advertising still feels as though it’s about increasing our familiarity with the brand.

Think of advertising in terms of cognitive fluency, i.e. how easy we find something to process. There are lots of ways to make something fluent – make it easy to read, easy to pronounce, write it in a simple font, or in high contrast.

Things that are fluent (easy to process) get processed faster. We tend to like fluent things better, find fluent statements more valid. We think companies with fluent names are more valuable.

Advertisers have (implicitly) known this for a long time. By incessantly dinging our minds with an advert over and over, we are gently having that brand branded upon our minds, making it easier to process, more familiar, and making us unwittingly and unjustifiedly like it more. Like the banks of a river worn smooth by the ceaseless flow, advertising erodes our minds.

If you are not paying for it, you’re not the customer; you’re the product being sold.