Helping computers learn to tackle big-data problems outside their comfort zones
Imagine combing through thousands of mugshots desperately looking for a match. If time is of the essence, the faster you can do this, the better. A*STAR researchers have developed a framework that could help computers learn how to process and identify these images both faster and more accurately1.
Peng Xi of the A*STAR Institute for Infocomm Research notes that the framework can be used for numerous applications, including image segmentation, motion segmentation, data clustering, hybrid system identification and image representation.
A conventional way that computers process data is called representation learning. This involves identifying a feature that allows the program to quickly extract relevant information from the dataset and categorize it — a bit like a shortcut. Supervised and unsupervised learning are two of the main methods used in representation learning. Unlike supervised learning, which relies on costly labeling of data prior to processing, unsupervised learning involves grouping or ‘clustering’ data in a similar manner to our brains, explains Peng.
Subspace clustering is a form of unsupervised learning that seeks to fit each data point into a low-dimensional subspace to find an intrinsic simplicity that makes complex, real-world data tractable. Existing subspace clustering methods struggle to handle ‘out-of-sample’, or unknown, data points and the large datasets that are common today.
“One of the challenges of the big-data era is to organize out-of-sample data using a machine learning model based on ‘in-sample’, or known, observational data,” explains Peng who, with his colleagues, has proposed three methods as part of a unified framework to tackle this issue. These methods differ in how they implement representation learning; one focuses on sparsity, while the other two focus on low rank and grouping effects. “By solving the large-scale data and out-of-sample clustering problems, our method makes big-data clustering and online learning possible,” notes Peng.
The framework devised by the team splits input data into ‘in-sample’ data or ‘out-of-sample’ data during an initial ‘sampling’ step. Next, the in-sample data is grouped into subspaces during the ‘clustering’ step, after which the out-of-sample data is assigned to the nearest subspace. These points are then designated as cluster members.
The team tested their approach on a range of datasets including different types of information, from facial images to text — both handwritten and digital — poker hands and forest coverage. They found that their methods outperformed existing algorithms and successfully reduced the computational complexity (and hence running time) of the task while still ensuring cluster quality.
Learn more: Thinking outside the sample
The Latest on: Big data
via Google News
The Latest on: Big data
How to Get the Best Results from Big Data Analysis
on February 20, 2019 at 12:30 pm
mic Listen to the podcast: Author Scott E. Page, a complex systems expert, explains how applying multiple data analysis models greatly enhances decision making. Scott E. Page, professor of complex ... […]
North Texas and the data center market: Expect a healthy year
on February 20, 2019 at 10:33 am
Dallas-Fort Worth has grabbed a leading position in the data center market – and it’s not about to give it up. The industry, which supplies key sites to house the gear that powers digital services, sa... […]
Daimler selects Microsoft Azure for its global big data platform
on February 20, 2019 at 8:42 am
Daimler AG is one of the biggest producers of premium cars and the world’s biggest manufacturer of commercial vehicles. Its business requires processing enormous amounts of confidential, business-crit... […]
2019 Executive Insights on Big Data
on February 20, 2019 at 8:13 am
1. While there were more than two dozen elements identified as being important for successful big data initiatives, identifying the use case, having quality data, and having the right tools were ... […]
Big data is being reshaped thanks to 100-year-old ideas about geometry
on February 20, 2019 at 7:55 am
Your brain is made up of billions of neurons connected by trillions of synapses. And how they're arranged gives rise to the brain's functionality and to your personality. That's why scientists in ... […]
Big Data in Internet of Things: IoT Data Management, Analytics, and Decision Making 2018-2023 | Accenture, Alteryx etc.
on February 20, 2019 at 7:23 am
Feb 20, 2019 (MarketersMedia via COMTEX) -- Big Data in Internet of Things report assesses the companies, technologies, and solutions for using advanced analytics and Big Data tools for IoT data proce... […]
PotBotics: Bringing Big Data to Medical Cannabis -- CFN Media
on February 20, 2019 at 4:05 am
Seattle, Washington, Feb. 20, 2019 (GLOBE NEWSWIRE) -- via NEWMEDIAWIRE -- CFN Media Group (“CFN Media”), the leading agency and financial media network dedicated to the North American cannabis indust... […]
Why Daimler moved its big data platform to the cloud
on February 20, 2019 at 3:03 am
Like virtually every big enterprise company, a few years ago, the German auto giant Daimler decided to invest in its own on-premises data centers. And while those aren’t going away anytime soon ... […]
How IoT And Big Data Analytics Can Make Our Food Safer
on February 20, 2019 at 1:20 am
Access to food is a basic human right, vital for good health and reduction of the risk of diseases. Eating many different foods helps maintain a healthy and interesting diet which, giving the body a r... […]
Epic CEO Lists Her “Groundbreaking” Big Data Goals for Healthcare
on February 19, 2019 at 11:02 am
February 19, 2019 - Big data is everywhere in the healthcare industry. From images, socioeconomic data, and lab tests to clinical notes, medical device readouts, prescription drug information, more da... […]
via Bing News