The Bestseller Code

The Bestseller Code, by Jodie Archer and Matthew L. Jockers, describes the results of a five year computer analysis of over 20,000 novels. The authors wanted to figure out what differentiates the 500 or so New York Times bestsellers in their corpus from the rest of the titles that didn’t make the bestseller list.

The subtitle, Anatomy of a Blockbuster Novel, describes the book’s goal, which is to describe the common traits of bestsellers, revealing some hidden and unexpected characteristics along the way.  The book does not pretend to offer a formula for writing bestsellers. A similar book in the world of sports might reveal insights into what made Michael Jordan, Kobe Bryant, and LeBron James such great basketball players, but that doesn’t mean you or I could follow the formula and become the next NBA superstar. This book is a descriptive anatomy, not prescriptive how-to.

The authors use highly-customized Natural Language Processing (NLP) software to analyze thousands of data points within each book, including the frequency of different words and word types, sentence length, which topics appear and with what frequency, and where the emotional high and low points of the plot occur.

One thing you should know about NLP software is that it enables the computer to describe a text, but it does not enable the machine to understand the text. For example, in reading Harry Potter, NLP software will point out that virtually every paragraph that mentions Voldemort is full of words that express negative sentiment (words like evil, terrible, fearful, etc.). From this, the software can infer that Voldemort is the villain. However, NLP software does not understand what it reads the way a human does. It cannot answer complex questions like, “How does Mrs. de Winter’s understanding of her world change when Maxim says, I never loved her?”

Most of Archer and Jockers’ findings make perfect sense, and will be familiar to people who read a lot of novels and to those who read the advice of successful authors on the craft of writing. Readers prefer an active protagonist to a passive one. Readers prefer language that’s close to the everyday vernacular over the more formal type of writing that appears in essays. Bestselling authors do not overload their sentences with adjectives and adverbs. They convey meaning with nouns and verbs, which makes the reading experience smoother and more fluid.

Bestsellers tend to focus on a few topics within each work, rather than trying to hit on every theme the author can think of. Typically, the three major topics of a bestseller account for about 30-40 percent of the total topical matter of the work. Certain topics are more likely to make a bestseller: technology, work, family life, close interpersonal relationships. Surprisingly, sex is not one of those topics, and bestsellers in general tend to feature less sex than non-bestsellers.

Archer and Jockers identify seven structural patterns common to the plots of all bestsellers. They present the structural patterns as line graphs, which give a clear picture of the story’s emotional highs and lows. The summary of the classic love story, for example, is “Boy meets girl, boy loses girl, boy gets girl back.” If you were to draw that as a line graph, there would be a high point near the beginning when the boy first meets the girl, followed by a low point in the middle when he loses her, and another high point at the end, when he gets her back.

One of the book’s surprising findings is that the emotional curves of all bestsellers follow one of these seven graphs. The plot lines hold for trashy romances, far-fetched thrillers, and revered literary prizewinners. The Bestseller Code even charts the plots of some of these seemingly disparate novels together on the same graph to show you how similar they are.

Much of the value of this book comes from its clear and well-described insights into what readers respond to, and from the authors diving into a number of texts to provide illuminating examples of the generalized patterns that the computer has revealed. Jockers has more of a traditional English-lit background, and will occasionally touch on books by Virginia Woolfe and James Joyce. Archer comes from the publishing world, and her discussions focus on more current works.

One work they dig into is the unexpected phenomenon Fifty Shades of Grey, which, on its surface, seems to defy all the rules. It was written by an indie author with no marketing to propel it, one of its primary topics is sex (and not mainstream vanilla sex, but BDSM) which is not part of the “bestseller DNA”, and both readers and critics mocked the quality and style of its writing. But for all that, it sold hundreds of millions of copies. Why?

This is where the computer analysis really shines, as it points out characteristic patterns of the work without the baggage of emotional or aesthetic judgement that a human reader would bring. The analysis showed that E.L James, despite what some might call a lack of style, had hit on every element of the blockbuster novel, from topical makeup to plot structure to character. The analysis also showed that, based on the number of paragraphs devoted to each topic, the book was more about close interpersonal relationships than sex in general. And “close personal relationships,” the authors remind us, is one of the top themes common to all bestsellers.

While the overall public discussion of Fifty Shades tended to focus more on the sex, the computer was able to see that readers were experiencing, perhaps on a less unconscious level, the same sorts of interpersonal relations that fascinate them in the genres of mystery, thriller, and historical drama.

Even more interesting, Archer and Jockers point out, is the plot structure of Fifty Shades, which is a subtle and unusual variation of one of the six basic structures common to all bestsellers. James’ novel, generally follows “Plotline 4,” which Christopher Booker, in The Seven Basic Plots, calls “Rebirth.” Archer and Jockers point out that “these plots tend to see the main characters experience change, renewal, and some sort of transformation.”

The twist that James put on this basic plot is that, instead of following the plot’s typical emotional pattern of beginning-high-low-high-low-end, she created a series of highs and lows throughout the book, which occurred at such regular intervals that the graph of them looks almost like a perfect sine wave. Archer and Jockers refer to this pattern as the emotional rhythm of the plot.

Outside of the Harry Potter series, which was primarily aimed at young readers, the only “adult” book in the past twenty years whose sales numbers compare to Fifty Shades is Dan Brown’s The Da Vinci Code. The authors point out that although The Da Vinci Code’s basic plot differs from Fifty Shades, the two books share an almost identical emotional rhythm. Page 106 of The Bestseller Code shows a graph of the two plot lines, with the high points, low points, and inflections points of both novels appearing almost in lockstep. Now who would have thought to compare those two books, or even mention them in the same breath?

Both were runaway bestsellers, and on an emotional level, both provided a strikingly similar reading experience despite their differences in topic, style, tone, and genre. That insight about emotional experience reminded me of a question and answer I read recently on Quora. A reader asked why the Harry Potter series was so popular despite the fact that its plots were not necessarily new and other writers had more interesting styles and were better at world building. An author named Nick Travers offered this response:

I used to think the same as you. I even thought, ‘I can write as well as that,’ so I started to write a novel to prove it.

Four novels later and I can write as well as J.K.Rowling, but the quality of my stories pale into insignificance compared to hers. What I’ve learned is that J.K. Rowling is not a particularly good writer, but she is a master story teller. When she tells a story it sparkles with magic in a way that draws people (especially youngsters) into her world. I wish I could tell stories like that.

The Bestseller Code maps out some of the characteristics and common traits of great story-telling in illuminating ways, and the authors’ commentary on a diverse body of well-known works makes it a fun and interesting read. If you’re interested in understanding what drives readers to buy books, this one is worth a read.

Leave a Reply

Your email address will not be published. Required fields are marked *