Helping computers learn to tackle big-data problems outside their comfort zones
Imagine combing through thousands of mugshots desperately looking for a match. If time is of the essence, the faster you can do this, the better. A*STAR researchers have developed a framework that could help computers learn how to process and identify these images both faster and more accurately1.
Peng Xi of the A*STAR Institute for Infocomm Research notes that the framework can be used for numerous applications, including image segmentation, motion segmentation, data clustering, hybrid system identification and image representation.
A conventional way that computers process data is called representation learning. This involves identifying a feature that allows the program to quickly extract relevant information from the dataset and categorize it — a bit like a shortcut. Supervised and unsupervised learning are two of the main methods used in representation learning. Unlike supervised learning, which relies on costly labeling of data prior to processing, unsupervised learning involves grouping or ‘clustering’ data in a similar manner to our brains, explains Peng.
Subspace clustering is a form of unsupervised learning that seeks to fit each data point into a low-dimensional subspace to find an intrinsic simplicity that makes complex, real-world data tractable. Existing subspace clustering methods struggle to handle ‘out-of-sample’, or unknown, data points and the large datasets that are common today.
“One of the challenges of the big-data era is to organize out-of-sample data using a machine learning model based on ‘in-sample’, or known, observational data,” explains Peng who, with his colleagues, has proposed three methods as part of a unified framework to tackle this issue. These methods differ in how they implement representation learning; one focuses on sparsity, while the other two focus on low rank and grouping effects. “By solving the large-scale data and out-of-sample clustering problems, our method makes big-data clustering and online learning possible,” notes Peng.
The framework devised by the team splits input data into ‘in-sample’ data or ‘out-of-sample’ data during an initial ‘sampling’ step. Next, the in-sample data is grouped into subspaces during the ‘clustering’ step, after which the out-of-sample data is assigned to the nearest subspace. These points are then designated as cluster members.
The team tested their approach on a range of datasets including different types of information, from facial images to text — both handwritten and digital — poker hands and forest coverage. They found that their methods outperformed existing algorithms and successfully reduced the computational complexity (and hence running time) of the task while still ensuring cluster quality.
Learn more: Thinking outside the sample
The Latest on: Big data
via Google News
The Latest on: Big data
As Surveys Falter Big Data Polling Narrows Our Societal Understanding
on April 23, 2019 at 12:47 pm
One of the most talked-about stories in the world of polling and survey research in recent years has been the gradual death of survey response rates and the reliability of those insights. Whether the ... […]
Big Data Gives Us New Insights Into Influencer Reach
on April 23, 2019 at 11:24 am
Influencers are rapidly transforming the digital marketing landscape. Nearly two out of five plan to increase their influencer marketing budgets substantially in the coming year. Only 7% of ... […]
Sight Machine raises $29.4M to help manufacturers analyze big data
on April 23, 2019 at 9:37 am
Manufacturing analytics company Sight Machine Inc. said today it has raised $29.4 million in funding after seeing massive customer growth in the last year. The Series C round was led by Korean ... […]
ASG Technologies Enhances its Data Intelligence Solution with Additional Big Data and Cloud Analyzers
on April 23, 2019 at 8:39 am
ASG Technologies, provider of solutions for information access, management, and control, has added new analyzer capabilities to its Data Intelligence solution. Leveraging ASG’s Intelligent Data ... […]
Top 8 Productivity Tracking Apps That Leverage Big Data
on April 23, 2019 at 5:47 am
Big data is playing a vital role in productivity optimization in virtually every industry. Countless new tools rely on big data to streamline productivity. Assuming you have a solid business model in ... […]
We are at the big data frontier – we’re going to need some rules
on April 23, 2019 at 2:35 am
Whether from social media use, video-watching habits, satellite images, traffic flows, or location data from smart devices, companies are increasingly able to use this information to extrapolate ... […]
Big Data Won’t Build a Better Robot
on April 22, 2019 at 6:05 pm
(Bloomberg Opinion) -- Artificial intelligence is going to boost human productivity in a thousand ways, transforming everything from transportation to health care to agriculture. Some enthusiastic ... […]
Countdown to Big Data in Precision Health: When industry and academia converge
on April 22, 2019 at 12:01 pm
Ahead of the Big Data in Precision Health conference, Emma Huang from Johnson & Johnson Innovations discusses collaborations between industry and academia. As the data demands of health research ... […]
Big Data and Analytics Can Transform Every Challenge into an Opportunity in the Telecom Sector | Read Quantzig’s New Article to Know How
on April 22, 2019 at 8:33 am
LONDON--(BUSINESS WIRE)--Apr 22, 2019--Quantzig, a leading analytics advisory firm that delivers customized analytics solutions, has announced the completion of their new article on the importance of ... […]
Big Banks Vs. Silicon Valley Startups: Whose Customer Financial Data Is It Anyway?
on April 22, 2019 at 8:26 am
As we all know by now, data is valuable. The more data a company has about you, the more power it has over your decision-making. In today’s big-data driven economy, the companies with the most data ... […]
via Bing News