Helping computers learn to tackle big-data problems outside their comfort zones
Imagine combing through thousands of mugshots desperately looking for a match. If time is of the essence, the faster you can do this, the better. A*STAR researchers have developed a framework that could help computers learn how to process and identify these images both faster and more accurately1.
Peng Xi of the A*STAR Institute for Infocomm Research notes that the framework can be used for numerous applications, including image segmentation, motion segmentation, data clustering, hybrid system identification and image representation.
A conventional way that computers process data is called representation learning. This involves identifying a feature that allows the program to quickly extract relevant information from the dataset and categorize it — a bit like a shortcut. Supervised and unsupervised learning are two of the main methods used in representation learning. Unlike supervised learning, which relies on costly labeling of data prior to processing, unsupervised learning involves grouping or ‘clustering’ data in a similar manner to our brains, explains Peng.
Subspace clustering is a form of unsupervised learning that seeks to fit each data point into a low-dimensional subspace to find an intrinsic simplicity that makes complex, real-world data tractable. Existing subspace clustering methods struggle to handle ‘out-of-sample’, or unknown, data points and the large datasets that are common today.
“One of the challenges of the big-data era is to organize out-of-sample data using a machine learning model based on ‘in-sample’, or known, observational data,” explains Peng who, with his colleagues, has proposed three methods as part of a unified framework to tackle this issue. These methods differ in how they implement representation learning; one focuses on sparsity, while the other two focus on low rank and grouping effects. “By solving the large-scale data and out-of-sample clustering problems, our method makes big-data clustering and online learning possible,” notes Peng.
The framework devised by the team splits input data into ‘in-sample’ data or ‘out-of-sample’ data during an initial ‘sampling’ step. Next, the in-sample data is grouped into subspaces during the ‘clustering’ step, after which the out-of-sample data is assigned to the nearest subspace. These points are then designated as cluster members.
The team tested their approach on a range of datasets including different types of information, from facial images to text — both handwritten and digital — poker hands and forest coverage. They found that their methods outperformed existing algorithms and successfully reduced the computational complexity (and hence running time) of the task while still ensuring cluster quality.
Learn more: Thinking outside the sample
The Latest on: Big data
via Google News
The Latest on: Big data
- Big Data on the Impact of Adverse Childhood Experienceson September 14, 2019 at 10:15 am
You use your platforms and voices to both raise awareness and enforce change and be change, and I feel deeply that there is nothing more respectable one with such a platform can do. — Lady Gaga The ...
- Universities offering more classes in 'big data'on September 14, 2019 at 7:30 am
OMAHA, Neb. (AP) — The term "big data" has permeated society without much clarity for many about what it means. A proposed master's program in data science at the University of Nebraska at Omaha will ...
- Big Data: I want to believeon September 14, 2019 at 4:58 am
A recent Gallup poll found that two-thirds of Americans believe the government knows more about aliens and UFOs than it leads people to believe. That number is down from almost 20 years ago, when 71 ...
- From the Tax Law Offices of David W. Klasing- How the IRS Uses Big Data Analytics to Catch (and Punish) Tax Evaderson September 13, 2019 at 4:47 pm
IRVINE, Calif., Sept. 13, 2019 /PRNewswire/ -- Like government agencies all over the world, the IRS continually adapts its data analytics strategy in an effort to keep up with changing pace of ...
- Big Data Security Market 2017 to 2025 Status and Trend by Leading Companies, Regional Outlookon September 13, 2019 at 3:48 am
Sep 13, 2019 (AmericaNewsHour) -- Global Big Data Security Market Analysis According to Market Research, The Global Big Data Security Market was valued at USD 14.72 Billion in 2017 and is projected to ...
- EU’s Vestager Targets Big Tech by Floating Potential Data Ruleson September 13, 2019 at 2:15 am
The European Union’s antitrust chief called for more rules to rein in how companies collect and use information, offering the first clues into how she may use new powers to target big technology firms ...
- Big Data Technology Market to Register a CAGR of 14.0% with Rise in E-Commerce Activities: Fortune Business Insightson September 13, 2019 at 1:39 am
Big Data can be understood as a term that describes extremely large volumes of structured and unstructured data that businesses have to collect and analyse on a daily basis. Structured data is ...
- The aviation industry is the 'last frontier' for big data. Here's how Honeywell is staffing up and launching new products to tackle that challenge.on September 12, 2019 at 7:30 am
John Peterson, the director of customer support in Honeywell's aerospace division, says the firm is hiring for roles that didn't exist 5 years ago.
- Big Data And AI Could Make Traditional Developers…Extinct?on September 12, 2019 at 7:02 am
Back in 2010, a team of self-described “data nerds” founded Sumo Logic. The vision was to build a platform to deliver–as a cloud native service–machine data analytics for anybody. It definitely turned ...
- Sidewalk Labs spins out Replica to help city planners create ‘virtual populations’ with big dataon September 12, 2019 at 7:00 am
Google sibling Sidewalk Labs has confirmed the latest project spinout from its incubator. Replica, originally known as Model Lab, is touted as a “next-generation urban planning tool” and ...
via Bing News