A mathematical technique called “differential privacy” gives researchers access to vast repositories of personal data while meeting a high standard for privacy protection
In 1997, when Massachusetts began making health records of state employees available to medical researchers, the government removed patients’ names, addresses, and Social Security numbers. William Weld, then the governor, assured the public that identifying individual patients in the records would be impossible.
Within days, an envelope from a graduate student at the Massachusetts Institute of Technology arrived at Weld’s office. It contained the governor’s health records.
Although the state had removed all obvious identifiers, it had left each patient’s date of birth, sex and ZIP code. By cross-referencing this information with voter-registration records, Latanya Sweeney was able to pinpoint Weld’s records.
Sweeney’s work, along with other notable privacy breaches over the past 15 years, has raised questions about the security of supposedly anonymous information.
“We’ve learned that human intuition about what is private is not especially good,” said Frank McSherry of Microsoft Research Silicon Valley in Mountain View, Calif. “Computers are getting more and more sophisticated at pulling individual data out of things that a naive person might think are harmless.”
As awareness of these privacy concerns has grown, many organizations have clamped down on their sensitive data, uncertain about what, if anything, they can release without jeopardizing the privacy of individuals. But this attention to privacy has come at a price, cutting researchers off from vast repositories of potentially invaluable data.
Medical records, like those released by Massachusetts, could help reveal which genes increase the risk of developing diseases like Alzheimer’s, how to reduce medical errors in hospitals or what treatments are most effective against breast cancer. Government-held information from Census Bureau surveys and tax returns could help economists devise policies that best promote income equality or economic growth. And data from social media websites like Facebook and Twitter could offer sociologists an unprecedented look at how ordinary people go about their lives.
The question is: How do we get at these data without revealing private information? A body of work a decade in the making is now starting to offer a genuine solution.
“Differential privacy,” as the approach is called, allows for the release of data while meeting a high standard for privacy protection. A differentially private data release algorithm allows researchers to ask practically any question about a database of sensitive information and provides answers that have been “blurred” so that they reveal virtually nothing about any individual’s data — not even whether the individual was in the database in the first place.
via Scientific American – Simons Science News
The Latest Streaming News: Differential privacy updated minute-by-minute
Bookmark this page and come back often