I just stumbled on this article from The Daily Telegraph of 9 January 2014 and written by Matthew Sparkes.
Scientists have developed an algorithm which can analyse a book and predict with 84 per cent accuracy whether or not it will be a commercial success. A technique called statistical stylometry, which mathematically examines the use of words and grammar, was found to be “surprisingly effective” in determining how popular a book would be.
The group of computer scientists from Stony Brook University in New York said that a range of factors determine whether or not a book will enjoy success, including “interestingness”, novelty, style of writing, and how engaging the storyline is, but admit that external factors such as luck can also play a role.
By downloading classic books from the Project Gutenberg (a library of over 50,000 free e-books) archive they were able to analyse texts with their algorithm and compare its predictions to historical information on the success of the work. Everything from science fiction to classic literature and poetry was included. It was found that the predictions matched the actual popularity of the book 84 per cent of the time. They found several trends that were often found in successful books, including heavy use of conjunctions such as “and” and “but” and large numbers of nouns and adjectives.
Less successful work tended to include more verbs and adverbs and relied on words that explicitly describe actions and emotions such as “wanted”, “took” or “promised”, while more successful books favoured verbs that describe thought processes such as “recognised” or “remembered”. To find “less successful” books for their tests, the researchers scoured Amazon for low-ranking books in terms of sales. They also included Dan Brown’s The Lost Symbol, despite its commercial success, because of “negative critiques it had attracted from media”.
“Predicting the success of literary works poses a massive dilemma for publishers and aspiring writers alike,” said Assistant Professor Yejin Choi, one of the authors of the paper published by the Association if Computational Linguistics. To the best of our knowledge, our work is the first that provides quantitative insights into the connection between the writing style and the success of literary works. Previous work has attempted to gain insights into the ‘secret recipe’ of successful books. But most of these studies were qualitative, based on a dozen books, and focused primarily on high-level content – the personalities of protagonists and antagonists and the plots. Our work examines a considerably larger collection – 800 books – over multiple genres, providing insights into lexical, syntactic, and discourse patterns that characterise the writing styles commonly shared among the successful literature.”
What I find surprising about this study is statistical correlation between ‘writing style’ and popularity. Eighty-four percent is a strong correlation! Conjunctions tend to keep the action moving, hence their frequent use. We’ve heard for some time that the use of adverbs is to be avoided, and that it is far better to choose a more accurate and descriptive verb. The frequent use of adjectives makes sense; after all we’re trying to paint a picture in the reader’s mind, and well-chosen adjectives will improve the clarity of the picture. It is interesting that verbs which convey action or emotion are less successful than verbs which convey thought processes. Perhaps this is because it is easier for a reader to ‘tune in’ to thought processes than it is for him or her to feel the action or the emotion. Is the corollary of this proposition a finding that thoughtful characters are more popular than active or emotional characters? No. I think this would be carrying the thought process too far.
It would be interesting if the algorithm were able to spot clichés or commonly used phrases, because these are thought to be a real turn-off for readers.
What do you think?