System that replaces human intuition with algorithms outperforms 615 of 906 human teams
Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which “features” of the data to analyze usually requires some human intuition. In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.
MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers’ “Data Science Machine” finished ahead of 615.
In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.
“We view the Data Science Machine as a natural complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the Data Science Machine. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”
Between the lines
Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper that Kanter will present next week at the IEEE International Conference on Data Science and Advanced Analytics.
Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in big-data analysis, such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk for dropping out of online courses.
“What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering,” Veeramachaneni says. “The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas.”
In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT’s online-learning platform MITx doesn’t record either of those statistics, but it does collect data from which they can be inferred.
Read more: Automating big-data analysis
The Latest on: Big-data analysis
via Google News
The Latest on: Big-data analysis
- Big Data Services Market Shaping from Growth to Value : Accenture, Fujitsu, Hewlett-Packard, Dellon October 23, 2020 at 3:31 pm
Stay up-to-date with Big Data Services market research offered by HTF MI. Check how key trends and emerging drivers are shaping this industry growth.
- The Worldwide Big Data Analytics in Retail Industry is Expected to Reach $14.1 Billion by 2026on October 23, 2020 at 2:58 pm
The "Global Big Data Analytics in Retail Market by Component, Deployment Type, Organization Size, Application and Region: Industry Analysis and Forecast 2020-2026" report has been added to ...
- Big Data Analytics in Banking Market to See Huge Growth by 2025 : HP, IBM, SAP, Oracleon October 23, 2020 at 12:14 pm
Edison, NJ -- (SBWIRE) -- 10/23/2020 -- Latest Report Available at Advance Market Analytics, "Big Data Analytics in Banking Market" provides pin-point analysis for changing competitive dynamics and a ...
- Geospatial technology - at the heart of the IoT Big Data analysison October 23, 2020 at 9:04 am
In an ever more connected world, Esri - the world leader in geospatial and analytical software is about to embark its next journey to help organisations understand and analyse vast amounts of Internet ...
- Big Data Market Worldwide Opportunities, Competitive Landscape and Emerging Trends by 2027 : Oracle, IBM, Google, Salesforce, Etc.on October 22, 2020 at 12:04 pm
A recent study on the Big Data market closely examines the performance of the major market vendors operating in the Big Data market for the forecast period 2020 to 2027.
- How the Trump campaign used big data to deter Miami-Dade’s Black communities from votingon October 22, 2020 at 11:32 am
Cambridge Analytica and the Trump Campaign labeled Floridians “deterrence” — voters who could be convinced not to vote. Black communities were targeted.
- The ‘Failure’ Of Big Dataon October 22, 2020 at 6:43 am
At a consequential moment in our history, when the veracity and integrity of data, analytics, and science are being called into question by those who denigrate and attack expertise and knowledge, it ...
- Europe big data and business analytics Market to Reach $105.82 Bn, by 2027 at 11.5% CAGR: AMRon October 22, 2020 at 1:46 am
Allied Market Research published a report, titled, " Europe big data and business analytics market by Component (Hardware, Software, and Service) Deployment Model (On-premise and Cloud), Analytics ...
- Global Big Data Services Market 2020-2024: Market is Poised to Grow by $64.27 Billion, Growing at a CAGR of 30% - ResearchAndMarkets.comon October 20, 2020 at 7:42 am
The "Global Big Data Services Market 2020-2024" report has been added to ResearchAndMarkets.com's offering. The big data services market is poised to grow by $64.27 bn during 2020-2024 progressing at ...
- The Big Data Revolution: Ellie Mae Study Shows How Data and Analytics Make the Mortgage Industry More Efficienton October 20, 2020 at 7:00 am
With interest rates near record lows, mortgage lenders are being inundated in the current high-volume market, increasing the need to use data and analytics tools to become more operationally efficient ...
via Bing News