Helping computers learn to tackle big-data problems outside their comfort zones
Imagine combing through thousands of mugshots desperately looking for a match. If time is of the essence, the faster you can do this, the better. A*STAR researchers have developed a framework that could help computers learn how to process and identify these images both faster and more accurately1.
Peng Xi of the A*STAR Institute for Infocomm Research notes that the framework can be used for numerous applications, including image segmentation, motion segmentation, data clustering, hybrid system identification and image representation.
A conventional way that computers process data is called representation learning. This involves identifying a feature that allows the program to quickly extract relevant information from the dataset and categorize it — a bit like a shortcut. Supervised and unsupervised learning are two of the main methods used in representation learning. Unlike supervised learning, which relies on costly labeling of data prior to processing, unsupervised learning involves grouping or ‘clustering’ data in a similar manner to our brains, explains Peng.
Subspace clustering is a form of unsupervised learning that seeks to fit each data point into a low-dimensional subspace to find an intrinsic simplicity that makes complex, real-world data tractable. Existing subspace clustering methods struggle to handle ‘out-of-sample’, or unknown, data points and the large datasets that are common today.
“One of the challenges of the big-data era is to organize out-of-sample data using a machine learning model based on ‘in-sample’, or known, observational data,” explains Peng who, with his colleagues, has proposed three methods as part of a unified framework to tackle this issue. These methods differ in how they implement representation learning; one focuses on sparsity, while the other two focus on low rank and grouping effects. “By solving the large-scale data and out-of-sample clustering problems, our method makes big-data clustering and online learning possible,” notes Peng.
The framework devised by the team splits input data into ‘in-sample’ data or ‘out-of-sample’ data during an initial ‘sampling’ step. Next, the in-sample data is grouped into subspaces during the ‘clustering’ step, after which the out-of-sample data is assigned to the nearest subspace. These points are then designated as cluster members.
The team tested their approach on a range of datasets including different types of information, from facial images to text — both handwritten and digital — poker hands and forest coverage. They found that their methods outperformed existing algorithms and successfully reduced the computational complexity (and hence running time) of the task while still ensuring cluster quality.
Learn more: Thinking outside the sample
The Latest on: Big data
via Google News
The Latest on: Big data
Big Data Can Help Us See Through Government Redactions
on April 20, 2019 at 3:38 pm
The release of the Mueller report this past week has brought with it renewed interest in the practice of “redaction” in which the government blacks out portions of officially released documents to ... […]
Here’s How Big Data And Business Card Marketing Go Together
on April 20, 2019 at 1:50 pm
Many people believe that digital media is rapidly replacing traditional forms of branding. They believe that advances in big data have made business cards, brochures and direct mail marketing obsolete ... […]
Big Banks vs. Silicon Valley Startups - Whose Customer Financial Data Is It Anyway?
on April 19, 2019 at 11:18 am
As we all know by now, data is valuable. The more data a company has about you, the more power it has over your decision-making. In today’s big-data driven economy, the companies with the most data ... […]
Target talks digital signage, using Big Data to deliver a personalized experience
on April 19, 2019 at 8:53 am
In the era of Big Data, retailers are in an arms race to deliver the best experience for customers based on customer data, especially with digital signage. Many companies are looking at using dynamic ... […]
Here’s Why Big Data Is the Future of Fleet Management
on April 19, 2019 at 8:51 am
There’s not an industry operating today that isn’t feeling the effects of technology, particularly when it comes to Big Data. Big Data is ringing in industrial change across the board – including, ... […]
Global Big Data Analytics in Healthcare Market is anticipated to flourish at a CAGR 18.2% Growth during 2018-2023
on April 19, 2019 at 6:05 am
A recent study on Big Data Analytics in Healthcare Market by KD Market Insights provides analysis of various factors that can affect the market share. The titled report " Big Data Analytics in ... […]
At ASF 2019: The Virtuous Circle of Big Data, AI and HPC
on April 18, 2019 at 2:58 pm
We’ve entered a new phase in IT — in the world, really — where the combination of big data, artificial intelligence, and high performance computing is pushing the bounds of what’s possible in business ... […]
A new alphabet for Europe: Algorithms, big data, and the computer chip
on April 18, 2019 at 1:44 pm
If the biggest disrupter of the last few decades was Deng Xiaoping—the father of modern China—the big disrupter of the next few decades may well be John McCarthy. McCarthy, an American professor of ... […]
Big Data Has Transformed Agriculture...In Some Places, Anyway
on April 18, 2019 at 9:15 am
As spring in the Northern Hemisphere arrives, farmers around the world are making decisions about what crops to plant and how to manage them. In the United States, farmers typically now have ... […]
Is A Big Data A Blessing Or A Curse To The Old Internet?
on April 17, 2019 at 12:47 pm
Although the Internet has existed since around the beginning of the Cold War, it didn’t reach American households until the mid-1990s. The proliferation of big data has had a major impact on the ... […]
via Bing News