Princeton University has developed a method that offers a solution to data overload
Advances in computer storage have created collections of data so huge that researchers often have trouble uncovering critical patterns in connections among individual items, making it difficult for them to realize fully the power of computing as a research tool.
Now, computer scientists at Princeton University have developed a method that offers a solution to this data overload. Using a mathematical method that calculates the likelihood of a pattern repeating throughout a subset of data, the researchers have been able to cut dramatically the time needed to find patterns in large collections of information such as social networks. The tool allows researchers to identify quickly the connections between seemingly disparate groups such as theoretical physicists who study intermolecular forces and astrophysicists researching black holes.
“The data we are interested in are graphs of networks like friends on Facebook or lists of academic citations,” said David Blei, an associate professor of computer science and co-author on the research, which was published Sept. 3 in the Proceedings of the National Academy of Science. “These are vast data sets and we want to apply sophisticated statistical models to them in order to understand various patterns.”
Finding patterns in the connections among points of data can be critical for many applications. For example, checking citations to scientific papers can provide insights to the development of new fields of study or show overlap between different academic disciplines. Links between patents can map out groups that indicate new technological developments. And analysis of social networks can provide information about communities and allow predictions of future interests.