Abstract
Writers are often viewed as having an inherent style that can serve as a literary fingerprint. By quantifying relevant features related to literary style, one may hope to classify written works and even attribute authorship to newly discovered texts. Beyond its intrinsic interest, the study of literary styles presents the opportunity to introduce and motivate many standard multivariate statistical techniques. Today the statistical analysis of literary styles is made much simpler by the wealth of real data readily available from the Internet. This article presents an overview and brief history of the analysis of literary styles. In addition we use canonical discriminant analyis and principal component analysis to identify structure in the data and distinguish authorship.
Original language | English (US) |
---|---|
Pages (from-to) | 175-185 |
Number of pages | 11 |
Journal | American Statistician |
Volume | 56 |
Issue number | 3 |
DOIs | |
State | Published - Aug 2002 |
Externally published | Yes |
Keywords
- Authorship
- Canonical discriminant analysis
- Data visualization
- Function words
- High-dimensional data
- Principal component analysis
ASJC Scopus subject areas
- Statistics and Probability
- Mathematics(all)
- Statistics, Probability and Uncertainty