System that replaces human intuition with algorithms outperforms 615 of 906 human teams
Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which “features” of the data to analyze usually requires some human intuition. In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.
MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers’ “Data Science Machine” finished ahead of 615.
In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.
“We view the Data Science Machine as a natural complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the Data Science Machine. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”
Between the lines
Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper that Kanter will present next week at the IEEE International Conference on Data Science and Advanced Analytics.
Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in big-data analysis, such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk for dropping out of online courses.
“What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering,” Veeramachaneni says. “The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas.”
In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT’s online-learning platform MITx doesn’t record either of those statistics, but it does collect data from which they can be inferred.
Read more: Automating big-data analysis
The Latest on: Big-data analysis
via Google News
The Latest on: Big-data analysis
- Big Data Services Market Scenario and Growth Analysis 2019-2025: Accenture, Deloitte, Teradata Big Dataon November 29, 2019 at 9:52 pm
The Big Data Services Market report also focuses on the development trends as well as history, competitive landscape analysis, and key regions in the international Big Data Services Market Industry.
- Quantzig’s Recent Article Unveils Three Big Data Trends to Watch Out for in 2020on November 29, 2019 at 6:15 am
To capitalize on the opportunities offered by big data, organizations must adapt to these trends and build a data-driven business culture where key decisions are taken based on the results obtained ...
- Global Marine Big Data Market Outlook 2019 - Revenue and Forecast Analysis By 2028on November 28, 2019 at 9:48 pm
The recent research study on " Global Marine Big Data Market " report serves an in-sight survey of the forecast 2019 to 2028 with latest trends based on the historical and current market situation ...
- Analyzing the Role of Sentiment Analysis in the Insurance Industry | Quantzig‘s New Articleon November 28, 2019 at 4:02 pm
Request a FREE proposal! Quantzig offers cutting-edge data science and social media sentiment analysis solutions to address the big data needs of commercial establishments from different verticals.
- Big data opens up new horizons for insurance companies: KPMG in Canada reporton November 28, 2019 at 3:30 am
"From the moment we're born to the moment we die, we generate an enormous amount of data," says Mr. Cornell. "Big data can open up new horizons for insurers. But, they must put into place much ...
- SSY Strengthens Big Data Capacity with Marine Benchmark Investmenton November 27, 2019 at 12:29 pm
This new partnership will enhance and strengthen SSY's client offering, as Marine Benchmark will bring a competitive advantage by providing expertise in statistical analysis across a wide range of ...
- What is big data?on November 27, 2019 at 1:30 am
Big data also infers the three Vs: Volume, Variety and Velocity. Volume refers to the size of the data, variety indicates that the datasets are non-homogenous, and velocity is the speed at which the ...
- Flexible and durable wood-based triboelectric nanogenerators for self-powered sensing in athletic big data analyticson November 26, 2019 at 8:16 am
In the new era of internet of things, big data collection and analysis based on widely distributed intelligent sensing technology is particularly important. Here, we report a flexible and durable wood ...
- 8 Emerging Trends In Big Data Management And Analytics To Watch In 2020on November 25, 2019 at 1:05 pm
In keeping with Gartner’s “three Vs” (volume, variety and velocity) definition of big data, the growing data deluge will increasingly include a broad range of data types—much of it live and in motion.
- Big Data as a Service Market Analysis, Size, Share, Growth, Trends and Forecast to 2025on November 25, 2019 at 4:05 am
Global big data as a service market is primarily driven by the increasing requirement of structured data for analysis which helps the organizations to achieve their goals or targets and grow at a ...
via Bing News