Berkeley Lab and UC Berkeley researchers say “iterative Random Forests” will deliver powerful scientific insights
While it may be the era of supercomputers and “big data,” without smart methods to mine all that data, it’s only so much digital detritus. Now researchers at the Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab) and UC Berkeley have come up with a novel machine learning method that enables scientists to derive insights from systems of previously intractable complexity in record time.
In a paper published recently in the Proceedings of the National Academy of Sciences (PNAS), the researchers describe a technique called “iterative Random Forests,” which they say could have a transformative effect on any area of science or engineering with complex systems, including biology, precision medicine, materials science, environmental science, and manufacturing, to name a few.
“Take a human cell, for example. There are 10170 possible molecular interactions in a single cell. That creates considerable computing challenges in searching for relationships,” said Ben Brown of Berkeley Lab’s Environmental Genomics and Systems Biology Division. “Our method enables the identification of interactions of high order at the same computational cost as main effects – even when those interactions are local with weak marginal effects.”
Brown and Bin Yu of UC Berkeley are lead senior authors of “Iterative Random Forests to Discover Predictive and Stable High-Order Interactions.” The co-first authors are Sumanta Basu (formerly a joint postdoc of Brown and Yu and now an assistant professor at Cornell University) and Karl Kumbier (a Ph.D. student of Yu in the UC Berkeley Statistics Department). The paper is the culmination of three years of work that the authors believe will transform the way science is done. “With our method we can gain radically richer information than we’ve ever been able to gain from a learning machine,” Brown said.
The needs of machine learning in science are different from that of industry, where machine learning has been used for things like playing chess, making self-driving cars, and predicting the stock market.
“The machine learning developed by industry is great if you want to do high-frequency trading on the stock market,” Brown said. “You don’t care why you’re able to predict the stock will go up or down. You just want to know that you can make the predictions.”
But in science, questions surrounding why a process behaves in certain ways are critical. Understanding “why” allows scientists to model or even engineer processes to improve or attain a desired outcome. As a result, machine learning for science needs to peer inside the black box and understand why and how computers reached the conclusions they reached. A long-term goal is to use this kind of information to model or engineer systems to obtain desired outcomes.
In highly complex systems – whether it’s a single cell, the human body, or even an entire ecosystem – there are a large number of variables interacting in nonlinear ways. That makes it difficult if not impossible to build a model that can determine cause and effect. “Unfortunately, in biology, you come across interactions of order 30, 40, 60 all the time,” Brown said. “It’s completely intractable with traditional approaches to statistical learning.”
The method developed by the team led by Brown and Yu, iterative Random Forests (iRF), builds on an algorithm called random forests, a popular and effective predictive modeling tool, translating the internal states of the black box learner into a human-interpretable form. Their approach allows researchers to search for complex interactions by decoupling the order, or size, of interactions from the computational cost of identification.
“There is no difference in the computational cost of detecting an interaction of order 30 versus an interaction of order two,” Brown said. “And that’s a sea change.”
In the PNAS paper, the scientists demonstrated their method on two genomics problems, the role of gene enhancers in the fruit fly embryo and alternative splicing in a human-derived cell line. In both cases, using iRF confirmed previous findings while also uncovering previously unidentified higher-order interactions for follow-up study.
Brown said they’re now using their method for designing phased array laser systems and optimizing sustainable agriculture systems.
“We believe this is a different paradigm for doing science,” said Yu, a professor in the departments of Statistics and Electrical Engineering & Computer Science at UC Berkeley. “We do prediction, but we introduce stability on top of prediction in iRF to more reliably learn the underlying structure in the predictors.”
“This enables us to learn how to engineer systems for goal-oriented optimization and more accurately targeted simulations and follow-up experiments,” Brown added.
The Latest on: Machine learning
- Best of arXiv.org for AI, Machine Learning, and Deep Learning – February 2018 on March 16, 2018 at 12:35 pm
In this recurring monthly feature, we filter recent research papers appearing on the arXiv.org preprint server for compelling subjects relating to AI, machine learning and deep learning – from disciplines including statistics, mathematics and computer ... […]
- XMPro Joins Industrial Internet Consortium Smart Factory Machine Learning Testbed on March 16, 2018 at 11:37 am
DALLAS, March 16, 2018 /PRNewswire/ -- XMPro, a leading IoT Application Development Platform provider, today announced its participation in the Industrial Internet Consortium® (IIC™) Smart Factory Machine Learning for Predictive Maintenance Testbed. […]
- IBM releases data science and machine learning platform Cloud Private for Data on March 16, 2018 at 11:13 am
IBM is embracing artificial intelligence with the launch of IBM Cloud Private for Data. The platform consists of integrated data science, data engineering and app building services. According to IBM, it is designed to help organizations accelerate their AI ... […]
- Power grid cybersecurity tool uses machine learning and sensors to detect threats on March 16, 2018 at 10:58 am
In today's always connected world, losing power is more than just an annoyance. "The truth is, we rely on electricity much more than we realize," writes Sherry Hewins in her column What Could Happen in a Long-Term Power Outage? "Even if you live 'off the ... […]
- New Yorker applied machine learning to blocked bike lane problem on March 16, 2018 at 8:18 am
He developed an algorithm to see how often the lanes are blocked by autos. Alex Bell likes to bike around New York City, but he got fed up with how often bike lanes were blocked by delivery trucks and idling cars. So he decided to do something about it ... […]
- Top Trends: Machine Learning, Microservices, Containers, Kubernetes, Cloud to Edge on March 16, 2018 at 7:29 am
Sysdig is the container intelligence company. The only unified platform for monitoring, security, and troubleshooting in a microservices-friendly architecture. Developers, Data Scientists, and operations are working together to build Intelligent Apps with ... […]
- IBM calls its new machine learning platform ‘the reinvention of the database’ on March 16, 2018 at 6:47 am
IBM Corp. today unveiled a new data science and machine learning platform that one executive called “the most significant announcement we’ve made about data in years.” Featuring an in-memory database, a real-time processing engine and the ability to ... […]
- Report shows Machine learning is helping make Google Play a whole lot safer on March 15, 2018 at 11:35 am
In an effort to keep customers safer, Google Play is getting smarter. Google has released its Android Security report for 2017, which highlights machine learning as an increasingly better way to keep malicious software off your phone. According to the ... […]
- Three examples of machine learning in the newsroom on March 15, 2018 at 11:24 am
What experts have to say about the use of machine learning in the newsroom, and what data journalists can learn from it — Notes from the 2018 NICAR conference In 1959, Arthur Samuel, a pioneer in machine learning, defined it as the ‘field of study ... […]
- Google: 60.3% of potentially harmful Android apps in 2017 were detected via machine learning on March 14, 2018 at 11:00 pm
Google released its Android Security 2017 Year in Review report today, the fourth installment of the company’s attempt to educate the public about Android’s various layers of security and its failings. One of the most interesting learnings to come out ... […]
via Google News and Bing News