Berkeley Lab and UC Berkeley researchers say “iterative Random Forests” will deliver powerful scientific insights
While it may be the era of supercomputers and “big data,” without smart methods to mine all that data, it’s only so much digital detritus. Now researchers at the Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab) and UC Berkeley have come up with a novel machine learning method that enables scientists to derive insights from systems of previously intractable complexity in record time.
In a paper published recently in the Proceedings of the National Academy of Sciences (PNAS), the researchers describe a technique called “iterative Random Forests,” which they say could have a transformative effect on any area of science or engineering with complex systems, including biology, precision medicine, materials science, environmental science, and manufacturing, to name a few.
“Take a human cell, for example. There are 10170 possible molecular interactions in a single cell. That creates considerable computing challenges in searching for relationships,” said Ben Brown of Berkeley Lab’s Environmental Genomics and Systems Biology Division. “Our method enables the identification of interactions of high order at the same computational cost as main effects – even when those interactions are local with weak marginal effects.”
Brown and Bin Yu of UC Berkeley are lead senior authors of “Iterative Random Forests to Discover Predictive and Stable High-Order Interactions.” The co-first authors are Sumanta Basu (formerly a joint postdoc of Brown and Yu and now an assistant professor at Cornell University) and Karl Kumbier (a Ph.D. student of Yu in the UC Berkeley Statistics Department). The paper is the culmination of three years of work that the authors believe will transform the way science is done. “With our method we can gain radically richer information than we’ve ever been able to gain from a learning machine,” Brown said.
The needs of machine learning in science are different from that of industry, where machine learning has been used for things like playing chess, making self-driving cars, and predicting the stock market.
“The machine learning developed by industry is great if you want to do high-frequency trading on the stock market,” Brown said. “You don’t care why you’re able to predict the stock will go up or down. You just want to know that you can make the predictions.”
But in science, questions surrounding why a process behaves in certain ways are critical. Understanding “why” allows scientists to model or even engineer processes to improve or attain a desired outcome. As a result, machine learning for science needs to peer inside the black box and understand why and how computers reached the conclusions they reached. A long-term goal is to use this kind of information to model or engineer systems to obtain desired outcomes.
In highly complex systems – whether it’s a single cell, the human body, or even an entire ecosystem – there are a large number of variables interacting in nonlinear ways. That makes it difficult if not impossible to build a model that can determine cause and effect. “Unfortunately, in biology, you come across interactions of order 30, 40, 60 all the time,” Brown said. “It’s completely intractable with traditional approaches to statistical learning.”
The method developed by the team led by Brown and Yu, iterative Random Forests (iRF), builds on an algorithm called random forests, a popular and effective predictive modeling tool, translating the internal states of the black box learner into a human-interpretable form. Their approach allows researchers to search for complex interactions by decoupling the order, or size, of interactions from the computational cost of identification.
“There is no difference in the computational cost of detecting an interaction of order 30 versus an interaction of order two,” Brown said. “And that’s a sea change.”
In the PNAS paper, the scientists demonstrated their method on two genomics problems, the role of gene enhancers in the fruit fly embryo and alternative splicing in a human-derived cell line. In both cases, using iRF confirmed previous findings while also uncovering previously unidentified higher-order interactions for follow-up study.
Brown said they’re now using their method for designing phased array laser systems and optimizing sustainable agriculture systems.
“We believe this is a different paradigm for doing science,” said Yu, a professor in the departments of Statistics and Electrical Engineering & Computer Science at UC Berkeley. “We do prediction, but we introduce stability on top of prediction in iRF to more reliably learn the underlying structure in the predictors.”
“This enables us to learn how to engineer systems for goal-oriented optimization and more accurately targeted simulations and follow-up experiments,” Brown added.
The Latest on: Machine learning
Virtual GPUs In OpenStack: Filling A Hole In The Cloud Machine Learning World
on May 23, 2018 at 8:38 am
I’m at the OpenStack Summit in Vancouver, BC, Canada. There are a lot of interesting things going on. However, most are very, very, technical. My interest is in what the OpenStack Foundation (OSF), its members, and the users are doing to provide ... […]
Human in the loop: Machine learning and AI for the people
on May 23, 2018 at 7:33 am
Paco Nathan is a unicorn. It's a cliche, but gets the point across for someone who is equally versed in discussing AI with White House officials and Microsoft product managers, working on big data pipelines and organizing and part-taking in conferences ... […]
Gracenote taps machine learning to classify individual song styles
on May 23, 2018 at 7:11 am
Media metadata company Gracenote is embracing machine learning and audio analysis as it focuses on helping music-streaming providers identify music styles on a track-by-track basis. Genre classification is a relatively easy task for most music-obsessed humans. […]
Machine learning can now help craft the perfect breakup playlist
on May 23, 2018 at 6:00 am
Nielsen-owned Gracenote is launching a new service called Sonic Style, an AI-fueled music descriptor system that will help the music industry make the perfect soundtracks for breakups, retirement parties, cross-fit workouts, or anything else. Sonic Style ... […]
Machine learning is helping computers spot arguments online before they happen
on May 23, 2018 at 5:27 am
It’s probably happened to you. You’re having a chat with someone online (on social media, via email, in Slack) when things take a nasty turn. The conversation starts out civil, but before you know it, you’re trading personal insults with a stranger ... […]
How Stitch Fix uses machine learning to master the science of styling
on May 23, 2018 at 3:00 am
Are the challenges of modern day retail solvable with data science? Personal styling service Stitch Fix thinks so. The San Francisco, Calif.-based company has forged a new kind of retail business model that uses data and AI to serve curated, personalized ... […]
Google and Coursera launch a new machine learning specialization
on May 23, 2018 at 12:00 am
Over the last few years, Google and Coursera have regularly teamed up to launch a number of online courses for developers and IT pros. Among those was the Machine Learning Crash course, which provides developers with an introduction to machine learning. […]
GV invests in medical machine learning startup Owkin
on May 22, 2018 at 10:55 pm
Owkin, a medical research machine learning startup, has announced a new investor in the form of GV (formerly Google Ventures), which has now joined as a late entrant to Owkin’s series A round. Founded in 2016, Owkin’s platform leverages deep learning ... […]
PA Software Uses Your iPhone Camera and Machine Learning To Generate High-Quality Baseball Analytics
on May 22, 2018 at 3:24 pm
Baseball arguably has the deepest game analytics out of any major league sport. These insights are only generated during games using expensive, specialized equipment and a team of expert baseball statisticians. It would be impossible to generate similar ... […]
Google brings Cloud TPUs to its Cloud Machine Learning Engine to speed AI training
on May 21, 2018 at 8:20 am
On Monday, Google announced that customers can now use Cloud TPUs on the Cloud Machine Learning Engine (ML Engine) in beta to speed the training of machine learning models. Cloud TPU quota is also now available to all Google Cloud Platform (GCP) customers ... […]
via Google News and Bing News