Something dramatic has happened, that will forever change the way I approach data
It’s a sad but true fact that most data that’s generated or collected—even with considerable effort—never gets any kind of serious analysis. But in a sense that’s not surprising. Because doing data science has always been hard. And even expert data scientists usually have to spend lots of time wrangling code and data to do any particular analysis.
I myself have been using computers to work with data for more than a third of a century. And over that time my tools and methods have gradually evolved. But this week—with the release of Wolfram|Alpha Pro—something dramatic has happened, that will forever change the way I approach data.
The key idea is automation. The concept in Wolfram|Alpha Pro is that I should just be able to take my data in whatever raw form it arrives, and throw it into Wolfram|Alpha Pro. And then Wolfram|Alpha Pro should automatically do a whole bunch of analysis, and then give me a well-organized report about my data. And if my data isn’t too large, this should all happen in a few seconds.
And what’s amazing to me is that it actually works. I’ve got all kinds of data lying around: measurements, business reports, personal analytics, whatever. And I’ve been feeding it into Wolfram|Alpha Pro. And Wolfram|Alpha Pro has been showing me visualizations and coming up with analyses that tell me all kinds of useful things about the data.
In the past, when I’d really been motivated, I’d take some data here or there, read it into Mathematica, and use some of the powerful tools there to do some analysis or another. But what’s new and exciting with Wolfram|Alpha Pro is that it is all so automatic. On a whim I can throw my data in, and expect to see something useful come out.
The basic idea is very much in line with the whole core mission of Wolfram|Alpha: to take expert-level knowledge, and create a system that can apply it automatically whenever and wherever it’s needed. Here the expert-level knowledge is the collection of methods that a team of good data scientists would have, and what Wolfram|Alpha Pro does is to take that knowledge and use it to analyze whatever data you feed in.
There are many challenges, and we’re still at any early stage in addressing all of them. But with the whole Wolfram|Alpha technology stack, as well as with the underlying Mathematica language, we were able to start from a very strong foundation. And in the course of building Wolfram|Alpha Pro we’ve invented all kinds of new methods.
There are several pieces to the whole problem. The first is just to get the data into Wolfram|Alpha in any kind of well-structured form. And as anyone who’s actually worked with real data knows, that’s often not as easy as it sounds.
You think you’ve got data that’s arranged in columns. But what about those weird separators? What about those headers? What about those delimiters that occur inside data elements? What about those missing elements? What about those lines that were stripped when copying from a browser? What about that second table in the same spreadsheet? And so on.
It’s a little like what Wolfram|Alpha has to do in understanding free-form natural language, with all its variations and redundancies. But the grammar for structured data is different, and in some ways less forgiving. And just as in the original development of Wolfram|Alpha, what we’ve done is to take a large corpus of examples, and try to deduce the appropriate grammar from what we see—with the knowledge that as we get large volumes of actual queries, we’ll gradually be able to improve this. (Needless to say, we use the analysis capabilities of Wolfram|Alpha Pro itself to do much of this analysis.)
Bookmark this page for “data science” and check back regularly as these articles update on a very frequent basis. The view is set to “news”. Try clicking on “video” and “2” for more articles.