Helping computers learn to tackle big-data problems outside their comfort zones
Imagine combing through thousands of mugshots desperately looking for a match. If time is of the essence, the faster you can do this, the better. A*STAR researchers have developed a framework that could help computers learn how to process and identify these images both faster and more accurately1.
Peng Xi of the A*STAR Institute for Infocomm Research notes that the framework can be used for numerous applications, including image segmentation, motion segmentation, data clustering, hybrid system identification and image representation.
A conventional way that computers process data is called representation learning. This involves identifying a feature that allows the program to quickly extract relevant information from the dataset and categorize it — a bit like a shortcut. Supervised and unsupervised learning are two of the main methods used in representation learning. Unlike supervised learning, which relies on costly labeling of data prior to processing, unsupervised learning involves grouping or ‘clustering’ data in a similar manner to our brains, explains Peng.
Subspace clustering is a form of unsupervised learning that seeks to fit each data point into a low-dimensional subspace to find an intrinsic simplicity that makes complex, real-world data tractable. Existing subspace clustering methods struggle to handle ‘out-of-sample’, or unknown, data points and the large datasets that are common today.
“One of the challenges of the big-data era is to organize out-of-sample data using a machine learning model based on ‘in-sample’, or known, observational data,” explains Peng who, with his colleagues, has proposed three methods as part of a unified framework to tackle this issue. These methods differ in how they implement representation learning; one focuses on sparsity, while the other two focus on low rank and grouping effects. “By solving the large-scale data and out-of-sample clustering problems, our method makes big-data clustering and online learning possible,” notes Peng.
The framework devised by the team splits input data into ‘in-sample’ data or ‘out-of-sample’ data during an initial ‘sampling’ step. Next, the in-sample data is grouped into subspaces during the ‘clustering’ step, after which the out-of-sample data is assigned to the nearest subspace. These points are then designated as cluster members.
The team tested their approach on a range of datasets including different types of information, from facial images to text — both handwritten and digital — poker hands and forest coverage. They found that their methods outperformed existing algorithms and successfully reduced the computational complexity (and hence running time) of the task while still ensuring cluster quality.
Learn more: Thinking outside the sample
The Latest on: Big data
via Google News
The Latest on: Big data
Big data, attention and wealth
on May 22, 2019 at 2:05 am
A core obsession of internet reformers is to loosen Big Tech’s stranglehold on the financial spoils from the data they vacuum up, and spread the riches around. But some economists say the payoff to ... […]
Big data becomes a new growth point of Guizhou's development
on May 22, 2019 at 1:44 am
GUIZHOU, China, May 22, 2019 /PRNewswire/ -- The Organizing Committee of the 2019 Big Data Expo -- China International Big Data Industry Expo 2019, also known as 2019 Big Data Expo, will be held from ... […]
How Big Data, Machine Learning Can Predict Cases Of Disease
on May 21, 2019 at 7:30 pm
Measles, once thought to have been eliminated in the U.S., is popping up in isolated outbreaks as a result of skipped well-child visits and parents’ fears that the measles-mumps-rubella (MMR) vaccine ... […]
Michael Stonebraker on Big Data Upheaval and the 800-Pound Gorilla in the Room
on May 21, 2019 at 6:35 pm
A.M. Turing Award Laureate and database technology pioneer Michael Stonebraker delivered the welcome keynote at Data Summit 2019, titled “Big Data, Technological Disruption, and the 800-Pound Gorilla ... […]
The role big data plays in health
on May 21, 2019 at 12:14 pm
Data is a real-time snapshot *Data is delayed at least 15 minutes. Global Business and Financial News, Stock Quotes, and Market Data and Analysis. ... […]
From Sensing to Sensemaking: Converging Big Data with Plant AI
on May 21, 2019 at 11:30 am
Synaptic Business Automation enables operational excellence by connecting all systems in an organization and integrating business and domain knowledge using digital automation, finds Frost ... […]
Artificial Intelligence in Big Data Analytics and IoT Markets, 2019-2024: Focus on Data Capture, Information and Decision Support Services
on May 21, 2019 at 10:21 am
Dublin, May 21, 2019 (GLOBE NEWSWIRE) -- The "Artificial Intelligence in Big Data Analytics and IoT: Market for Data Capture, Information and Decision Support Services 2019-2024" report has been ... […]
EHR Big Data, Machine Learning Reveal Four Subtypes of Sepsis
on May 21, 2019 at 6:40 am
May 21, 2019 - Using machine learning to analyze big data in the EHR revealed four subtypes of sepsis, showing that sepsis patients could benefit from different types of treatment instead of a ... […]
Reasons why Big Data in Oil and Gas Market is getting more popular in the past decade- HPE, IBM, Oracle, Teradata etc
on May 21, 2019 at 5:23 am
May 21, 2019 (marketresearchupdates.com via COMTEX) -- The latest market report published by Reports monitors demonstrates that the will showcase a steady CAGR in the coming years. The report ‘Big ... […]
Big Data Analytics will Transform these Industries, See Which
on May 21, 2019 at 5:16 am
Data glut is evident, and it is predicted that subsequently data storage will result in 40 trillion gigabytes by 2020. 40 percent of this data will be based on sensors or machine-to-machine data. Most ... […]
via Bing News