Early influenza detection and the ability to predict outbreaks are critical to public health. Reliable estimates of when influenza will peak can help drive proper timing of flu shots and prevent health systems from being blindsided by unexpected surges, as happened in the 2012-2013 flu season.
The Centers for Disease Control and Prevention collects accurate data, but with a time lag of one to two weeks. Google Flu Trends began offering real-time data in 2008, based on people’s Internet searches for flu-related terms. But it ultimately failed, at least in part because not everyone who searches “flu” is actually sick. As of last year, Google instead now sends its search data to scientists at the CDC, Columbia University and Boston Children’s Hospital.
Now, a Boston Children’s-led team demonstrates a more accurate way to pick up flu trends in near-real-time — at least a week ahead of the CDC — by harnessing data from electronic health records (EHRs).
As Mauricio Santillana, PhD, John Brownstein, PhD, and colleagues describe in Scientific Reports, the team combined EHR data, historical patterns of flu activity and a machine-learning algorithm to interpret the data. This clinical “big data” approach produced predictions of national and local influenza activity that closely matched the CDC’s subsequent reporting.
“Our study shows the true value of considering multiple data streams in disease surveillance,” says Brownstein, the study’s senior investigator and Chief Innovation Officer at Boston Children’s Hospital. “While Google data provide incredible real-time, population-wide information, clinical data add a more accurate and precise assessment of disease state.”
Crunching EHR data
Instrumental to the study were data from collaborator Athenahealth, encompassing more than 72,000 healthcare providers and EHRs for more than 23 million patients.
The investigators first trained their flu-prediction algorithm, called ARES, with data captured from June 2009 through January 2012: weekly total visit counts, visit counts for flu and flu-like illness, visit counts for flu vaccination and more. ARES then used that intelligence to estimate flu activity over the next three years, through June 2015.
The team showed that ARES’ estimates of national and regional flu activity had error rates two to three times lower than earlier predictive models. ARES also correctly estimated the timing and magnitude of the national flu “peak week.” It was slightly less accurate in predicting regional peak weeks, but clearly outperformed Google Flu Trends on all measures.
The idea of capturing data directly from health care encounters definitely makes sense — assuming such data can be liberated from proprietary, HIPAA-bound healthcare IT systems. “As EHR data become more ubiquitously available, we will see major leaps in our ability to monitor and track disease outbreaks,” says Brownstein.
“Having access to near-real-time aggregated EHR information has enabled us to significantly improve our flu tracking and forecasting systems,” agrees Santillana, a member of Boston Children’s Computational Health Informatics Program (CHIP), and also affiliated with Harvard Medical School and the Harvard Institute for Applied Computational Sciences. “Real-time tracking will enable local public health officials to better prepare for unusual flu activity and potentially save lives.”