notes.husk.org. scribblings by Paul Mison.

2014-01-02

Netflix Genres and Literary Creators

text 18:33:00

Today, Alexis Madrigal posted How Netflix Reverse Engineered Hollywood at the Atlantic. In it, he describes how he, Sarah Pavis, and Ian Bogost scraped the 76,897 genres that Netflix uses, and then goes on to pull out some insights (as well as including discussions with Todd Yellin, who created the system).

While the Atlantic team were doing this, George Oates was talking to me. On the same weekend that Alexis was collecting his spreadsheet, I’d also figured out that the genre IDs were incremental and discoverable, and I started to pull down the pages and extract titles. After a quick drink when we discussed things we could do with the data, Nick Sweeney joined in and took over the bulk of the downloading, while I kept pulling things together into a SQLite database.

Unfortunately, day jobs and the desire for a relaxing Christmas break intervened, but here’s one snippet that Madrigal didn’t include that I found interesting. Netflix has genres for “directed by” and “starring”, and the article includes “the Perry Mason mystery”: the fact that those lists seem to be generally sensible, except for the stars and director of Perry Mason films being oddly prominent.

image

What isn’t covered are the series listed as “created by”, of which there are 103. However, there are six people who have “based on a work by”, because they didn’t themselves write films or series. They’re an odd mix.

The first three are fairly obvious English literature giants: William Shakespeare, Jane Austen, and Charles Dickens. A fourth only has one genre associated (“Movies based on a work by…”, rather than any subgenres), and that’s Oscar Wilde.

So, the other two? Agatha Christie and Stephen King. Never let it be said that writing genre fiction won’t get you viewers, if not immortality.

2011-05-24

post/5812502488

photo 23:28:04
Dead Yet Alive - the top five historical figures mentioned on BBC TV. (From Ladies and Gentleman, this is the BBC at technogoggles, wherein subtitles are grist for data mining.)

Dead Yet Alive - the top five historical figures mentioned on BBC TV. (From Ladies and Gentleman, this is the BBC at technogoggles, wherein subtitles are grist for data mining.)

what

more