Today, Alexis Madrigal posted How Netflix Reverse Engineered Hollywood at the Atlantic. In it, he describes how he, Sarah Pavis, and Ian Bogost scraped the 76,897 genres that Netflix uses, and then goes on to pull out some insights (as well as including discussions with Todd Yellin, who created the system).
While the Atlantic team were doing this, George Oates was talking to me. On the same weekend that Alexis was collecting his spreadsheet, I’d also figured out that the genre IDs were incremental and discoverable, and I started to pull down the pages and extract titles. After a quick drink when we discussed things we could do with the data, Nick Sweeney joined in and took over the bulk of the downloading, while I kept pulling things together into a SQLite database.
Unfortunately, day jobs and the desire for a relaxing Christmas break intervened, but here’s one snippet that Madrigal didn’t include that I found interesting. Netflix has genres for “directed by” and “starring”, and the article includes “the Perry Mason mystery”: the fact that those lists seem to be generally sensible, except for the stars and director of Perry Mason films being oddly prominent.
What isn’t covered are the series listed as “created by”, of which there are 103. However, there are six people who have “based on a work by”, because they didn’t themselves write films or series. They’re an odd mix.
The first three are fairly obvious English literature giants: William Shakespeare, Jane Austen, and Charles Dickens. A fourth only has one genre associated (“Movies based on a work by…”, rather than any subgenres), and that’s Oscar Wilde.