Helping computers learn to tackle big-data problems outside their comfort zones
Imagine combing through thousands of mugshots desperately looking for a match. If time is of the essence, the faster you can do this, the better. A*STAR researchers have developed a framework that could help computers learn how to process and identify these images both faster and more accurately1.
Peng Xi of the A*STAR Institute for Infocomm Research notes that the framework can be used for numerous applications, including image segmentation, motion segmentation, data clustering, hybrid system identification and image representation.
A conventional way that computers process data is called representation learning. This involves identifying a feature that allows the program to quickly extract relevant information from the dataset and categorize it — a bit like a shortcut. Supervised and unsupervised learning are two of the main methods used in representation learning. Unlike supervised learning, which relies on costly labeling of data prior to processing, unsupervised learning involves grouping or ‘clustering’ data in a similar manner to our brains, explains Peng.
Subspace clustering is a form of unsupervised learning that seeks to fit each data point into a low-dimensional subspace to find an intrinsic simplicity that makes complex, real-world data tractable. Existing subspace clustering methods struggle to handle ‘out-of-sample’, or unknown, data points and the large datasets that are common today.
“One of the challenges of the big-data era is to organize out-of-sample data using a machine learning model based on ‘in-sample’, or known, observational data,” explains Peng who, with his colleagues, has proposed three methods as part of a unified framework to tackle this issue. These methods differ in how they implement representation learning; one focuses on sparsity, while the other two focus on low rank and grouping effects. “By solving the large-scale data and out-of-sample clustering problems, our method makes big-data clustering and online learning possible,” notes Peng.
The framework devised by the team splits input data into ‘in-sample’ data or ‘out-of-sample’ data during an initial ‘sampling’ step. Next, the in-sample data is grouped into subspaces during the ‘clustering’ step, after which the out-of-sample data is assigned to the nearest subspace. These points are then designated as cluster members.
The team tested their approach on a range of datasets including different types of information, from facial images to text — both handwritten and digital — poker hands and forest coverage. They found that their methods outperformed existing algorithms and successfully reduced the computational complexity (and hence running time) of the task while still ensuring cluster quality.
Learn more: Thinking outside the sample
The Latest on: Big data
via Google News
The Latest on: Big data
Q&A: How to respond to the latest big data breach?
on February 16, 2019 at 7:02 am
Around 617 million online account details have been stolen from 16 hacked websites, and these have been placed are on sale by the criminals on the dark web. Cybersecurity experts tell Digital Journal ... […]
This 78-hour Excel bootcamp shows you how to crunch big data
on February 15, 2019 at 1:44 pm
From marketing strategy to structural engineering, data plays a pivotal role in many sectors. As such, anyone with number crunching skills has a serious advantage in the jobs market. If you want to st... […]
Big Data Spending Market in Healthcare Sector Market Astonishing Growth of 12% CAGR| IBM, Microsoft, Oracle, SAP
on February 15, 2019 at 1:24 pm
Feb 15, 2019 (Heraldkeeper via COMTEX) -- HTF MI released a new market study on Global Big Data Spending Market with 100+ market data Tables, Pie Chat, Graphs & Figures spread through Pages and easy t... […]
Sustainability, Big Data, and Corporate Social Responsibility
on February 15, 2019 at 11:21 am
The expected cares and concerns of corporations have changed over the years. In the modern era, priorities simply cannot stop at the bottom line anymore. The social responsibility of corporations has ... […]
Here Are The Skills You Need To Work With Big Data
on February 15, 2019 at 9:31 am
A data analyst is a professional whose work involves collecting, cleaning, visualizing, and transforming or modeling raw data into the blocks of information that are used by marketers, developers and ... […]
Big Data, IoT And AI, Part One: Three Sides Of The Same Coin
on February 15, 2019 at 8:36 am
Since computers were first invented people have been looking for 'the next big thing.' Now, as close to half the world owns a phone faster than the earliest supercomputers, it is difficult to keep tra... […]
Six Nations 2019: Can Big Data And AI Predict Wales V England Clash?
on February 15, 2019 at 7:22 am
For sports fans in the British Isles, France and Italy, the Six Nations rugby championship is a welcome annual tradition that signifies the changing of the seasons.The first round of matches is usuall... […]
Big Data in the Renewable Energy Sector
on February 14, 2019 at 10:19 pm
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample ... […]
How Big Data Helped Russia Become A Leader In Car Sharing
on February 14, 2019 at 11:25 am
Big data is playing a more important role in Russia than ever before. A 2017 report by Science Direct illustrated this point. One of the ways big data is being applied is in one of the fastest growing ... […]
Big data may drive IP enforcement, businesses reveal
on February 14, 2019 at 3:01 am
In-house sources at Peloton Technology, Sabic and elsewhere explain that big data analysis tools could be increasingly used for analyses surrounding IP litigation as the technology becomes more reliab... […]
via Bing News