Researchers at Columbia University, Princeton and Harvard University have developed a new approach for analyzing big data that can drastically improve the ability to make accurate predictions about medicine, complex diseases, social science phenomena, and other issues.
In a study published in the December 13 issue of Proceedings of the National Academy of Sciences (PNAS), the authors introduce the Influence score, or “I-score,” as a statistic correlated with how much variables inherently can predict, or “predictivity”, which can consequently be used to identify highly predictive variables.
“In our last paper, we showed that significant variables may not necessarily be predictive, and that good predictors may not appear statistically significant,” said principal investigator Shaw-Hwa Lo, a professor of statistics at Columbia University. “This left us with an important question: how can we find highly predictive variables then, if not through a guideline of statistical significance? In this article, we provide a theoretical framework from which to design good measures of prediction in general. Importantly, we introduce a variable set’s predictivity as a new parameter of interest to estimate, and provide the I-score as a candidate statistic to estimate variable set predictivity.”
Current approaches to prediction generally include using a significance-based criterion for evaluating variables to use in models and evaluating variables and models simultaneously for prediction using cross-validation or independent test data.
“Using the I-score prediction framework allows us to define a novel measure of predictivity based on observed data, which in turn enables assessing variable sets for, preferably high, predictivity,” Lo said, adding that, while intuitively obvious, not enough attention has been paid to the consideration of predictivity as a parameter of interest to estimate. Motivated by the needs of current genome-wide association studies (GWAS), the study authors provide such a discussion.
In the paper, the authors describe the predictivity for a variable set and show that a simple sample estimation of predictivity directly does not provide usable information for the prediction-oriented researcher. They go on to demonstrate that the I-score can be used to compute a measure that asymptotically approaches predictivity. The I-score can effectively differentiate between noisy and predictive variables, Lo explained, making it helpful in variable selection. A further benefit is that while usual approaches require heavy use of cross-validation data or testing data to evaluate the predictors, the I-score approach does not rely as much on this as much.
“We offer simulations and an application of the I-score on real data to demonstrate the statistic’s predictive performance on sample data,” he said. “These show that the I-score can capture highly predictive variable sets, estimates a lower bound for the theoretical correct prediction rate, and correlates well with the out of sample correct rate. We suggest that using the I-score method can aid in finding variable sets with promising prediction rates, however, further research in the avenue of sample-based measures of predictivity is needed.”
The authors conclude that there are many applications for which using the I-score would be useful, for example in formulating predictions about diseases with high dimensional data, such as gene datasets, in the social sciences for text prediction or financial markets predictions; in terrorism, civil war, elections and financial markets.
“We’re hoping to impress upon the scientific community the notion that for those of us who might be interested in predicting an outcome of interest, possibly with rather complex or high dimensional data, we might gain by reconsidering the question as one of how to search for highly predictive variables (or variable sets) and using statistics that measure predictivity to help us identify those variables to then predict well,” Lo said. “For statisticians in particular, we’re hoping this opens up a new field of work that would focus on designing new statistics that measure predictivity.”
Receive an email update when we add a new PREDICTION TOOL article.
The Latest on: Accurate predictions
via Google News
The Latest on: Accurate predictions
- Eagles-Dolphins NFL Week 13 predictions 2019on November 30, 2019 at 9:22 pm
Injuries have played a big part in the Eagles' inability to sustain drives and score points. Many are wondering what is wrong with Carson Wentz. His accuracy has been off. He's turned the ball over ...
- What December's Cancer Horoscope Predictions Mean for Youon November 30, 2019 at 4:02 pm
On Monday, December 2, Jupiter enters business daddy Capricorn, where it will stay for an entire year. You have a reputation as a motherly homebody, which isn't entirely accurate. You care about your ...
- UNI vs. San Diego preview: Time, live stream, line, prediction for FCS first-round gameon November 30, 2019 at 3:11 am
S.D., next Saturday. We value your trust and work hard to provide fair, accurate coverage. If you have found an error or omission in our reporting, tell us here.
- Malaria predictions based on seasonal climate forecasts in South Africa: A time series distributed lag nonlinear modelon November 29, 2019 at 2:05 am
The malaria prediction model performed well for short-term predictions (correlation coefficient, r > 0.8 for 1- and 2-week ahead forecasts). The prediction accuracy decreased as the lead time ...
- TV debates and polls can reveal a lot – but don't rely on them for accurate election result predictionson November 28, 2019 at 3:38 pm
In the fog of a general election, it is tempting for political reporters to grasp hold of TV debates and polls as a guide to how the campaign is going. All Westminster was agog on Wednesday for the ...
- Early Predictions for 'Star Wars: The Rise of Skywalker' Box Office Numbers Revealed!on November 27, 2019 at 12:46 pm
We’re just three weeks away from the release of Star Wars: The Rise of Skywalker and early box office predictions have been revealed! Based on early tracking numbers, experts are putting the opening ...
- Brain age prediction using deep learning uncovers associated sequence variantson November 27, 2019 at 2:18 am
Recent publications, have demonstrated that MRIs can be used to predict chronological age with reasonably good accuracy 1,4,5. Such predictions provide an estimate of biological brain age in ...
- Market experts made big predictions for 2019 — but did they get them right?on November 26, 2019 at 11:20 am
In an accurate summary of oil's 2019 performance so far, the bank said prices were then "expected to trend lower as global growth cools." Predictions for the average 2019 oil price for Brent included ...
- AWS re:Invent 2019 - Predictions And A Wishliston November 26, 2019 at 6:09 am
As an analyst, I have been attempting to predict the announcements from re:Invent (2018, 2017) with decent accuracy. But with each passing year ... service for airports all over the world to manage ...
via Bing News