“Superintelligence: Paths, Dangers, Strategies”

Nick Bostrom
This was, simultaneously, one of the driest and most terrifying books I have ever read.
Really, the conclusion summarized it well:

“Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. Such is the mismatch between the power of our plaything and the immaturity of our conduct. Superintelligence is a challenge for which we are not ready now and will not be ready for a long time. We have little idea when the detonation will occur, though if we hold the device to our ear we can hear a faint ticking sound.”

It is what the title says: a list of ways we can achieve superintelligence (including, I’d note, a discussion of the fact that it’s both necessary and inevitable), a harrowing discussion of exactly how many ways it can go wrong, and some things we can start trying to do to keep it from going all Skynet on us. Or, as is more likely, wiping out humanity without really noticing, because we were a convenient source of raw materials.
Like I said: terrifying.
But valuable. I’m also convinced this book should be required reading for any AI course. And, y’know, a good chunk of the population beyond that: I count AI as one of the three most likely existential threats out there.1
So hey, want to somehow be a little bored and scared out of your mind at the same time? Read it.

  1. I’ve got it tied with “Global War, Nuclear” and “Climate Change.” Lower on the list are “A Pandemic With 100% Transmission Rate and 90-Plus Percent Lethality” and “Something From Space.” 

“Personalized Hey Siri”

Apple Machine Learning Journal:

In addition to the speaker vectors, we also store on the phone the “Hey Siri” portion of their corresponding utterance waveforms. When improved transforms are deployed via an over-the-air update, each user profile can then be rebuilt using the stored audio.

The most Apple-like way to continuously improve that I can think of. More interesting, though, is this bit later on:

The network is trained using the speech vector as an input and the corresponding 1-hot vector for each speaker as a target.

To date, ‘personalized Hey Siri’ has meant “the system is trained to recognize only one voice.” That quote, though, sounds like they’re working on multiple-user support; which, with the HomePod, they really should be.


Tidbits from Apple’s Machine Learning Journal

A short while ago, Apple launched a journal on machine learning; the general consensus on why they did it is that AI researchers want their work to be public, although as some have pointed out, the articles don’t have a byline. Still, getting the work out at all, even if unattributed, is an improvement over their normal secrecy.
They’ve recently published a few new articles, and I figured I’d grab some interesting tidbits to share.
In one, they talked about their use of deep neural networks to power the speech recognition used by Siri; in expanding to new languages, they’ve been able to decrease training time by transferring over the trained networks from existing language recognition systems to new languages.1 Probably my favorite part, though, is this throwaway line:

While we wondered about the role of the linguistic relationship between the source language and the target language, we were unable to draw conclusions.

I’d love to see an entire paper exploring that; hopefully that’ll show up eventually. You can read the full article here.
Another discusses the reverse – the use of machine learning technology for audio synthesis, specifically the voices of Siri. Google has done something similar,2 but as Apple mentions, it’s pretty computationally expensive to do it that way, and they can’t exactly roll out a version of Siri that burns through 2% of your iPhone’s battery every time it has to talk. So, rather than generate the entirety of the audio on-device, the Apple team went with a hybrid approach – traditional speech synthesis, based on playing back chunks of audio recordings, but using machine learning techniques to better select which chunks to play based, basically, on how good they’ll sound when they’re stitched together. The end of the article includes a table of audio samples comparing the Siri voices in iOS 9, 10, and 11, it’s a cool little example to play with.
The last of the three new articles discusses the method by which Siri (or the dictation system) knows to change “twenty seventeen” into “2017,” and the various other differences between spoken and written forms of languages. It’s an interesting look under the hood of some of iOS’ technology, but mostly it just made me wonder about the labelling system that powers the ‘tap a date in a text message to create a calendar event’ type stuff – that part, specifically, is fairly easy pattern recognition, but the system also does a remarkable job of tagging artist names3 and other things. The names of musical groups is a bigger problem, but the one that I wonder about the workings of is map lookups – I noticed recently that the names of local restaurants were being linked to their Maps info sheet, and that has to be doing some kind of on-device search, because I doubt Apple has a master list of every restaurant in the world that’s getting loaded onto every iOS device.
As a whole, it’s very cool to see Apple publishing some of their internal research, especially considering that all three of these were about technologies they’re actually using.

  1. The part in question was specific to narrowband audio, what you get via bluetooth rather than from the device’s onboard microphones, but as they mention, it’s harder to get sample data for bluetooth microphones than for iPhone microphones. 
  2. Entertainingly, the Google post is much better designed than the Apple one; Apple’s is good-looking for a scientific journal article, but Google’s includes some nice animated demonstrations of what they’re talking about that makes it more accessible to the general public. 
  3. Which it opens, oh-so-helpfully, in Apple Music, rather than iTunes these days. 
Education Travel

“Neural Audio,” or, “What I Did This Summer”

I’ve had a few people1 ask me what, exactly, I was doing all summer, off in Louisiana. As a programmer, being efficient is sort of the goal of everything I do; as such, doing a single write-up here and then sending that link to people makes more sense than answer the question over and over.2
I spent the summer working at a National Science Foundation-funded Research Experience for Undergraduates at the Center for Computation and Technology at Louisiana State University.3 It’s a pretty cool setup they’ve got at the CCT4 – it’s not an academic unit, it’s a research group only. The building has all sorts of handy resources – all of us in the program had access to both a shared workspace for the REU students and our own individual workrooms, which varied depending on our project.5 The exciting new thing for me was the server room, which I had access to.6 There were a few machines of interest in there – HIVE, a cluster-in-progress that was devoted entirely towards art that required high-powered computation, and Titan, a machine designed for use with neural networks.
This is where I lead in to my specific research program, which wound up being titled “Neural Audio,” as above.7 The goal was basically an exploration of the use of deep neural networks for music information retrieval.
Whoops, went a bit jargon-heavy. Let’s break it down.

Deep Neural Networks

You may have heard about this one before – neural networks are the current big thing in artificial intelligence. Google uses them to power a lot of things, but the big one people have heard about is Google Photos, where deep neural networks provide the incredible search features.8 As you might guess from the name, they’re based off the structure of the human brain:9 a bunch of nodes, connected by weighted edges, which are the neurons and synapses of the artificial brain.10 Now, what’s cool about machine learning is the training: instead of sitting down and writing an algorithm to perform a task, you just build up a big data set of questions and their paired answers. Then you feed it into the system, and it learns11 how to answer the questions.
Of course, it’s not that open-ended- you can’t drop the works of Shakespeare in there and expect it to write a paper analyzing his writing style.12 They work best with categorization – you give them a set number of categories, and the network can tell you either which category something belongs to, or the percentage chance that thing falls into each category.13
Beyond that, there’s nothing fancy about neural networks – they’re just a software construct used to do a heck of a lot of math, the end result of which is an algorithm that no human could’ve designed. Cool stuff.

Music Information Retrieval

The field of MIR isn’t new, they’ve been around for a while doing cool things. It really does what it says on the tin: the idea is to be able to feed a piece of music into the software and receive useful information about the music out. Software that can recognize the key of a song being played or identify the speed at which the piece is being performed are good examples of this.14

Combining Them

My work was basically looking into combining these two fields. Machine learning can do some cool stuff, the idea went, so why not try applying it to music?
This took two forms: trying to identify the genre of a piece, and trying to identify the instruments playing in a piece.
It’s here that I’m going to hand off the explanation to another thing I was working on this summer, though as a test subject rather than a researcher: the digital poster. One of the other research groups at the CCT was working on a system to modernize the poster presentation, a staple of scientific conferences. I had the opportunity to be one of the trial-run students for the digital poster, and wound up putting together an online version as my way of wireframing what the final product would look like. Being me, I made my wireframe look just as good as the ‘official’ one, and wound up posting the whole thing online and providing a QR code on the paper poster15 that linked to the online site.


While the summer, and thus the time I had at LSU, came to an end, the work didn’t. I’m still16 trading emails with my mentor, and I’m hopefully going to be attending another conference at some point to talk about my work. In the interim, I hope to be able to get some additional work done, maybe get some more interesting data out of the machines. It’s a goal, and time will tell how well I’m able to accomplish it.
That’s about all I’m going to write here – if you want to know more, you can check out the digital poster, and if that doesn’t get you enough information, you can fire me an email, it’s grey (at) this site.17

  1. Reasonably 
  2. If I were teaching a computer science course, the first thing I’d say would be along the lines of “‘efficiency’ is just a codeword for ‘laziness that won’t get you fired.’” 
  3. Or “LSU CCT NSF REU” for out-of-order short. 
  4. I hope you read the last footnote, because I’m going to be using these short-forms of the names throughout. Efficiency! 
  5. One person had a few offices shared with graduate students working on the same program; another had a Mac lab to themselves; I was given the key to a media lab on another floor. 
  6. I found this oddly entertaining after I had to let one of the IT staff in there to reboot a server following a power failure. 
  7. I kept trying to make it “neural audio,” because I’m a millenial and thus hate capital letters, but I was overruled by my mentor. Probably for the best. 
  8. Seriously, the fact that I can search for someone’s name and have it accurately spit out a list of every photo I’ve taken with them in it is seriously impressive. The fact that I can ask for stuff like “mountain” or “car” and also get accurate results? Mind-blowing. 
  9. Though, it’s important to note that they’re not based off an accurate/current idea of how the human brain works; we’re computer scientists, not biologists. 
  10. The weighting of the edges is important, as that’s where all the magic happens. Each node, simplified down, is performing an averaging operation over all of its inputs. The output is then passed along the edges, and transformed by the weight of that edge, creating the new input for the next node. 
  11. Using a system called Stochastic Gradient Descent, which I find to be a very elegant solution the problem. (I recommend reading the previous footnote before this one.) Learning, via training goes like this: you feed the network an input, and the randomized initial weights do the processing and spit out an answer. That’s probably not the right answer, so the network will change the weights in a random ‘direction,’ and then try again. If it’s closer to the right answer, the network will change the weights in that direction again; if it was further away, it’ll try a different direction. The process of training is just repeating that operation over and over and over again. 
  12. Although, entertainingly, you can drop the entire works of Shakespeare into a neural network and have it make a spirited attempt at creating a new work in the style of Shakespeare. 
  13. That’s called softmax, and it’s pretty handy. I looked at using changing softmax results over time as a way of extracting metadata from music. 
  14. Entertainingly, some of the best examples of MIR arguably aren’t MIR at all: Gracenote, for example, the system that allows the ‘smart’ stereo systems in cars to figure out what CD you’ve just put in, is based on a ‘CD fingerprint’ that looks at the length of the tracks and when each one starts. It is possible, with a lot of effort, to design a CD that will show up as being something entirely different than it actually is. 
  15. We were all required to make traditional paper posters, regardless of our use of digital posters. 
  16. Infrequently, because time zones. 
  17. I’m not dumb enough to put my email address up on the open web, c’mon. I already get way too much spam email. 

Finding a simple algorithm for intelligence

Michael Nielsen:

I don’t believe we’ll ever find a simple algorithm for intelligence. To be more concrete, I don’t believe we’ll ever find a really short Python (or C or Lisp, or whatever) program – let’s say, anywhere up to a thousand lines of code – which implements artificial intelligence. Nor do I think we’ll ever find a really easily-described neural network that can implement artificial intelligence. But I do believe it’s worth acting as though we could find such a program or network. That’s the path to insight, and by pursuing that path we may one day understand enough to write a longer program or build a more sophisticated network which does exhibit intelligence. And so it’s worth acting as though an extremely simple algorithm for intelligence exists.

Making progress is all about dreaming big.