Sep 082011
 

“Culturomics” is an emerging field of study into human culture that relies on the collection and analysis of large amounts of data.

A previous culturomic research effort used Google’s culturomic tool to examine a dataset made up of the text of about 5.2 million books to quantify cultural trends across seven languages and three centuries. Now a new research project has used a supercomputer to examine a dataset made up of a quarter-century of worldwide news coverage to forecast and visualize human behavior. Using the tone and location of news coverage, the research was able to retroactively predict the recent Arab Spring and successfully estimate the final location of Osama Bin Laden to within 200 km (124 miles).

The research used the large shared-memory supercomputer called Nautilus, which is part of the National Institute for Computational Sciences (NICS) network of advanced computing resources at Oak Ridge National Laboratory (ORNL) and boasts 1,024 cores and 4 terabytes of global shared memory. The dataset used was formed by combining three massive news archives that totaled more than 100 million articles worldwide. They included the complete New York Times (NYT) from 1945 to 2005, the unclassified edition of the Summary of World Broadcasts (SWB) from 1979 to 2010, and an archive of English-language Google News articles spanning 2006 to 2011. These archives provided a cross-section of the U.S. media spanning half a century and the global media over a quarter-century.

Using this data, Kalev Leetaru of the University of Illinois in Urbana-Champaign and author of the study used advanced tonal, geographic, and network analysis methods to produce a network 2.4 petabytes in size containing more than 10 billion people, places, things, and activities linked by over 100 trillion relationships that provided a cross-section of Earth from the news media. Leetaru let the supercomputer find interesting patterns in the bulk of the data, which he then recreated using a more traditional targeted and smaller-scale approach. In this way, Leetaru was able to produce real-time forecasts of human behavior, such as national conflicts and the movement of specific individuals.

Tone

Leetaru says that examining the tone of a news story is one of the most important aspects of his version of culturomics and the most reliable metric for conflict. He cites the example of the Foreign Broadcast Information Service (FBIS) news-monitoring service, which produced an analytical report on December 6, 1941 – the day before the bombing of Pearl Harbor – that noted the bitterness of Japanese radio broadcasts in relation to the U.S. had increased and appeals for peace had ceased.

“They recognized the most valuable part about the news was not the factual parts, but the latent parts – the tone, the emotion,” said Leetaru.

“Almost every Fortune 500 company monitors the tone of news and social media coverage about their products,” Leetaru added. “There’s been a huge amount of research coming out of the business literature on the power of news tone to predict economic behavior, yet there hasn’t been as much work in using it to predict social behavior.”

To create a numeric measurement of overall tone in a document, Leetaru used an algorithm that counted the number of “positive” and “negative” words that appear and assign a positive or negative value. Using dictionaries with pre-assigned positive and negative words, Leetaru used two tone-mining methods. The first counted the density of positive and negative words then subtracted the values to get a measure of overall tone. The second method used a dictionary that numerically rated each word from extremely negative to extremely positive and then averaged the score of all the words found in the story for a more nuanced result.

Read more . . .

Bookmark this page for “Culturomics research” and check back regularly as these articles update on a very frequent basis. The view is set to “news”. Try clicking on “video” and “2” for more articles.

Other Interesting Posts

Leave a Reply

%d bloggers like this: