The field of “culturomics” promises humanities researchers a robust quantitative tool to analyze cultural trends back to the 1500s
Can culture be decoded like a genome? A team from Harvard University has teamed up with Google to crack the spines of 5,195,769 digitized books that span five centuries of the printed word with the hopes of giving the humanities a more quantitative research tool.
The Google Books Ngram Viewer, launched online December 16 and described in a paper in Science, allows Web users to query their respective areas of interest based on n-grams (a method of modeling sequences in natural language).
• How engrained has Einstein really been in the cultural consciousness?
• Has interest in evolution been on a steady increase over the past 150 years?
• Have superheroes always been out to “save the world”?
Questions such as these have launched reams of undergraduate and graduate papers, which have traditionally required long hours searching the stacks—or JSTOR—for mentions to tally by hand and loads of close reading.
But there has been a growing movement afoot to bring more quantitative analysis to the humanities, such as using cognitive science and MRIs for English department research at Yale University, as The New York Times reported in April. Social scientists and humanities scholars have dipped their toes in the quantitative research waters via Perseus and WordHoard. And like the physical sciences, more—and better—data can lead to more robust results. “We can think fruitfully about culture by collecting large sets of information,” says Erez Lieberman Aiden, an investigator at Harvard’s School of Engineering and Applied Science’s Laboratory-at-Large and in the school’s Society of Fellows. “Having collected the data set, we can apply very analytic and high-throughput tools to understand [it].”
The Harvard team is calling their analysis “culturomics” based on the notion thatculture “is something you can study like evolution in biology,” says Jean-Baptiste Michel, a postdoctoral researcher in Harvard’s psychology department and in the Program for Evolutionary Dynamics, who helped lead the charge with Aiden. As a gene or phenotype changes over time, so, too, the researchers propose, do cultural sensibilities.
The tool will be “like biology in the sense that you can formulate questions that are quantitative, and you can obtain quantitative answers to them,” Aiden says. But like a genome-wide association study (GWAS), the findings are often just the starting point.
What’s in a word?
Many humanities scholars are meeting this and other quantitative-based approaches with a mix of excitement and trepidation. “Word frequency is a tool with enormous potential,” says Nicholas Dames, associate chair of the Department of English and Comparative Literature at Columbia University. But he has reservations about the use of frequencies alone to address “more nuanced questions, particularly about semantics.”