Quantitative analysis of literary styles

Roger D. Peng, Nicolas W. Hengartner

Research output: Contribution to journalArticlepeer-review

57 Scopus citations


Writers are often viewed as having an inherent style that can serve as a literary fingerprint. By quantifying relevant features related to literary style, one may hope to classify written works and even attribute authorship to newly discovered texts. Beyond its intrinsic interest, the study of literary styles presents the opportunity to introduce and motivate many standard multivariate statistical techniques. Today the statistical analysis of literary styles is made much simpler by the wealth of real data readily available from the Internet. This article presents an overview and brief history of the analysis of literary styles. In addition we use canonical discriminant analyis and principal component analysis to identify structure in the data and distinguish authorship.

Original languageEnglish (US)
Pages (from-to)175-185
Number of pages11
JournalAmerican Statistician
Issue number3
StatePublished - Aug 2002
Externally publishedYes


  • Authorship
  • Canonical discriminant analysis
  • Data visualization
  • Function words
  • High-dimensional data
  • Principal component analysis

ASJC Scopus subject areas

  • Statistics and Probability
  • Mathematics(all)
  • Statistics, Probability and Uncertainty


Dive into the research topics of 'Quantitative analysis of literary styles'. Together they form a unique fingerprint.

Cite this