System that replaces human intuition with algorithms outperforms 615 of 906 human teams
Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which “features” of the data to analyze usually requires some human intuition. In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.
MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers’ “Data Science Machine” finished ahead of 615.
In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.
“We view the Data Science Machine as a natural complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the Data Science Machine. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”
Between the lines
Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper that Kanter will present next week at the IEEE International Conference on Data Science and Advanced Analytics.
Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in big-data analysis, such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk for dropping out of online courses.
“What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering,” Veeramachaneni says. “The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas.”
In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT’s online-learning platform MITx doesn’t record either of those statistics, but it does collect data from which they can be inferred.
Read more: Automating big-data analysis
The Latest on: Big-data analysis
via Google News
The Latest on: Big-data analysis
The Demise of Big Data
on May 24, 2019 at 4:13 am
Big data as an application (or as a service ... NoSQL Database solutions depend upon a technique called graph analysis, a method of finding relationships among data elements. Using NoSQL requires an ... […]
ANALYSIS: In U-Tapao, an opportunity for growth for Thai Technical
on May 24, 2019 at 12:23 am
Thai Technical's MRO facility at U-Tapao will be torn down to make way for a newer, digital-centric campus When the new U-Tapao facility is up and running, Thai Technical will make greater use of big ... […]
Hadoop Big Data Analytics Market - Global Industry Analysis, Size, Share, Growth, Trends and Forecast 2019 2025
on May 23, 2019 at 11:16 am
(MENAFN - iCrowdNewsWire) Market segment by Type, the product can be split into To analyze global Hadoop Big Data Analytics status, future forecast, growth opportunity, key market and key players. ... […]
Analytics Insight Magazine Announces 'The 10 Most Innovative Big Data Analytics Companies in 2019'
on May 23, 2019 at 5:55 am
SAN JOSE, California and HYDERABAD, India, May 23, 2019 /PRNewswire/ -- Analytics Insight Magazine, a brand of Stravium Intelligence has named 'The 10 Most Innovative Big Data Analytics Companies ... […]
Powerbridge Technologies Launches Smart Big Data Platform in One of China’s Largest Special Trade Zones for Bonded Import & Export Goods
on May 23, 2019 at 4:00 am
The first phase aims to significantly improve the connectivity and efficiency in regulatory compliance and customs clearance, allowing global trade authorities to conduct real-time risk assessments ... […]
Big Data enables better urban transport networks
on May 22, 2019 at 7:07 am
Now, cities across Europe can benefit from a solution enriching Big Data with a spatial component, enabling complex analysis of travellers' behaviour to improve public transport networks. ... […]
Global Big Data Market in the Oil and Gas Sector 2019-2023| Use of Big Data by AI and ML Tool to Boost Growth| Technavio
on May 22, 2019 at 6:25 am
Historical data from sensors, when analyzed using big data solutions, help oil and gas companies predict the potential possibility of failure of equipment or production disruption. The analysis of ... […]
Artificial Intelligence in Big Data Analytics and IoT Market Significant CAGR Growth to Be Achieved by 2019-2025
on May 22, 2019 at 3:02 am
May 22, 2019 (MarketersMedia via COMTEX) -- Artificial Intelligence in Big Data Analytics and IoT Market 2019-2025 Trends & Forecast Report, Consistent with our stated policy of making available the ... […]
The role big data plays in health
on May 21, 2019 at 12:14 pm
Data is a real-time snapshot *Data is delayed at least 15 minutes. Global Business and Financial News, Stock Quotes, and Market Data and Analysis. ... […]
EHR Big Data, Machine Learning Reveal Four Subtypes of Sepsis
on May 21, 2019 at 6:40 am
May 21, 2019 - Using machine learning to analyze big data in the EHR revealed four subtypes of sepsis ... frequency of sepsis types and clinical characteristics as the primary analysis. The same ... […]
via Bing News