Back to blog

Why literature is the ultimate big-data challenge

See blog

Readers' comments

The Economist welcomes your views. Please stay on topic and be respectful of other readers. Review our comments policy.


Actually, those findings are not so surprising. The close ties of the early history plays to Marlowe have often been noted. And among Shakespeares histories The Henry VI plays have always been considered the weakest and least characteristic.

Also that other authors had their hands in some of the late plays is suspected already for a long time.

I welcome these technologies that can bring additional insight to these questions. But I dont expect big surprises. Many excellent scholars with some sense of style already made such assumption which are now just confirmed.

great uncle clive

In today`s terms... Shakespeare was a screaming drag queen... You do realise that... The problem with Shakespeare isn`t WHO was he ? but WHAT was he ?
The historical Shakespeare could not write... That`s the first problem... It took an age for him to write his name , his handwriting was so laboured... Solution... He dictated his output
When you think of Shakespeare , imagine him in the back room of a London tavern with his ``high-strung`` theatrical chums... And put a few drinks into him , and the iambs come pouring out in a pentametric torrent... And it all gets taken down in Elizabethan shorthand , and shipped off to the printers unedited... Shakespeare ad libbed his plays , acting out the various parts , grabbing bits of women`s costumes for the female roles, egged on by Marlowe and Middleton , screaming at the top of his voice , HIS voice , Shakespeare`s voice which never varies
Many of his plays revolve around drag
And when he wasn`t dragging it up , he was the most insufferable prig , fawning before his superiors and overbearing his subordinates.. a man who would not allow his daughters to be educated... who bought a baronetcy for his father... who sued a workman for a shilling
When you think Shakespeare , think Tom Hulce as Mozart in the film of Amadeus... an over-the-top genius but a bit strange ... You revere him , but want to keep him at arm`s length
Incidentally there is a life-time portrait of Shakespeare carbon-dating to the 1590`s... the Sanders portrait (You can google it)... which shows him as a somewhat effete histrionic youth NOT how we like to see our best playwright... (It also shows him with an attached earlobe as in the First Folio woodcut... a dead giveaway)... That`s my two cents worth


Flimsy 'proof'.

Computer algorithms are based upon assumptions. To the degree any of the assumptions are flawed -- or if they are all correct but don't fit perfectly and completely together -- the conclusion becomes less valid.


The credulous attribution of merit to the "theory" of how "function" words provide "scientific" insight into authorship exposes much of the folly (in all its senses) of this type of inquiry into literary creation. For one, it completely ignores that the language being analyzed is poetic (specifically in this case, I mean that it generally adheres to a regular meter). These allegedly dispositive aspects of Shakespearean language can often be understood much more convincingly as tools to preserve the meter and mellifluousness of what is (let us not forget) intended as spoken language. (Simply contrast it to prose intended only to be read to oneself, in which most authors would omit such words to make the language seem more forceful, the honey-like flow of language being of considerably less importance -- as a general matter -- than in Elizabethan (and other) dramatic speech.)
I also agree with Shaun Cutts's point about it being "a stretch to characterize [a factual understanding of authorship] as helping us to understand" the plays. I would demur only (and very mildly) from his courteous inclusion of the word "seems." Perhaps, in this context, it would not be inexcusable for me to borrow from the Bard (whomever he, she or they may be): “Seems,” [sir]? Nay, it is. I know not 'seems'.”


"Nevertheless, it represents a strain of academic objectivity, rarely found in the field of Shakespeare studies. Surely that’s an idea that both Bernard and the maligned “maths mob” can endorse."

Why? I mean, why are we supposed to accept that "objectivity" is worthwhile on this topic without some sort of argument (objective or subjective) that this is so? And don't call me Shirley.

non-juror in reply to JPoff

What is the point of literary criticism, analysis etc? It is all subjective and without rationality. My 'exposure' to (adult, not children's) literature, as far as I, at 74, can remember, was Xmas Carol, O Twist and Pickwick in my junior-school days, and my enthusiasm for such fiction, in English, French and Spanish, is undiminished. 'Studying' it for 5/7 years at school wasted time that could have been spent reading. There should have been book-lists for each year, from which pupils would make guided and changeable choices, on which, throughout each year, they would give their own, reasoned opinions, not marked for 'right/wrong' but for the observational and analytical skills they showed, which would be discussed in class.
There was even less reason for such literary studies in Uni foreign lang studies. In the time spent on that over 4 years, I could have become satisfactorily competent in the 4 skills in two more than the three I took my degree in.
Why a PhD in such subjects is 'equal' (in what sense?) to one obtained in Maths, Science, Medicine and other disciplines that have important impacts on our real lives, I really do not know.

Shaun Cutts

"By understanding that their authorship is messy, contested and symbiotic, we can better understand the plays themselves."
Understanding that the authorship is messy and symbiotic can in principle help us understand the plays, but why does understanding that the authorship is contested help us understand the plays? It may caution us not to overinterpret, but it seems a stretch to characterize this as helping us to understand.


I highly doubt that computers can tell authorship since it takes human understanding with all its subtle intricacies and irrational ambiance to understand or misunderstand other humans ... computers, although extremely valuable, are just tools unless they actually achieve artificial intellect which means they become self aware ... another example is when some guy used IBM's Watson to examine the speeches between Obama and Trump in the hope that Trump would look foolish ... it turned out that Watson did not recognise any difference between the two speeches to his disappointment ... computers are just tools with algorithms and processing powers for more advanced analysis but they are subject to human understanding with words, understanding and insight that are still struggling .................................................

rachel novak

It seems odd to me to treat as novel the idea that Elizabethan plays were the product of incredibly collaborative writers. I'm a Classicist, but even I know this stuff is old hat to to the field of English lit and those historians who deal with historical literary culture. Besides, you'll generally find that the field (of any of the humanities that deal with literature) is increasingly critical of the idea that you can break collaborative works down into individual contributions, or that it even matters who wrote what.
Using algorithms to posit which exact writers wrote which exact lines will not at all deflate the myth of the unrivaled genius. Doing so is effectively a question of "have we been lauding the wrong genius" not "maybe we need to re-evaluate our approach to authorship."


The general rule is : The Economist loves statistics and statistical analysis of every variety. The author's arguments here are unconvincing. Using computers to try to determine authorship does not help us to better understand the history plays. Readers and performers of these plays over the past four centuries did not fail to understand them for lack of statistical analysis performed by computers. Nor are these analyses immune to the general computer ailment of "garbage in-garbage out."