What Big Data is seeing now looks like the classic industrial curve
The business of Big Data, which involves collecting large amounts of data and then searching it for patterns and new revelations, is the result of cheap storage, abundant sensors and new software. It has become a multibillion-dollar industry in less than a decade. Growing at speed like that, it is easy to miss how much remains to do before the industry has proven standards. Until then, lots of customers are probably wasting much of their money.
There is essential work to be done training a core of people in very hard problems, like advanced statistics and software that ensures data quality and operational efficiency. Broad-based literacy in the uses of data should probably happen too, along with new kinds of management, better tools for reading the information, and privacy safeguards for corporate and personal information.
That such a huge number of tasks are taking place is a good indicator that, even with the hype, Big Data is a big deal. Last Friday, a number of technologists gathered at a forum hosted by the University of California, Berkeley, iSchool and talked about ways many of these jobs are being done. (Disclosure: I lecture at the iSchool, which is the school of information science, and moderated several panels there.) They talked about the progress so far, and identified a number of good ideas and businesses left to pursue.
In some ways, Big Data is about managing all kinds of weird new data, like social media updates from a mobile phone. It is hard to categorize in the first place, and may be used in lots of different ways, from advertising to traffic management. The so-called unstructured database of choice is by now pretty clearly Hadoop.
Cloudera, a leading software producer, is now training 1,500 people a month, mostly online, in how to use both the database and associated applications. According to Amr A. Awadallah, Cloudera’s chief technical officer, over 10,000 people have been trained on its system.
Data quality from new diverse sources is still a big problem, as is persuading companies and organizations to let others see data that might be more valuable in a commonly shared algorithm. “I’ve tried paying money for it, but it’s easier for companies to decide not to share,” said Gil Elbaz, the founder of Factual, a company that seeks to hold lots of online data. “The only way that works is to get them to take risks in exchange for data that is valuable to them.”
Much of the fear about exposing data, he said, has to do with competitors learning secrets. Mr. Elbaz thinks there is a good business in developing “de-identifiers” that can make data anonymous, and privacy insurers specializing in covering the costs of exposure.
On a personal level, others think the government or a trusted private institution will hold the personal identifiers of things like medical data, releasing it to trusted parties. “It’s a little scary that right now a cab driver using Uber knows more about you than a doctor, who has to take all of your information for the first time,” said Peter Skomoroch, principal data scientist at LinkedIn.
via New York Times – QUENTIN HARDY
The Latest Streaming News: Big Data updated minute-by-minute