Helping computers learn to tackle big-data problems outside their comfort zones
Imagine combing through thousands of mugshots desperately looking for a match. If time is of the essence, the faster you can do this, the better. A*STAR researchers have developed a framework that could help computers learn how to process and identify these images both faster and more accurately1.
Peng Xi of the A*STAR Institute for Infocomm Research notes that the framework can be used for numerous applications, including image segmentation, motion segmentation, data clustering, hybrid system identification and image representation.
A conventional way that computers process data is called representation learning. This involves identifying a feature that allows the program to quickly extract relevant information from the dataset and categorize it — a bit like a shortcut. Supervised and unsupervised learning are two of the main methods used in representation learning. Unlike supervised learning, which relies on costly labeling of data prior to processing, unsupervised learning involves grouping or ‘clustering’ data in a similar manner to our brains, explains Peng.
Subspace clustering is a form of unsupervised learning that seeks to fit each data point into a low-dimensional subspace to find an intrinsic simplicity that makes complex, real-world data tractable. Existing subspace clustering methods struggle to handle ‘out-of-sample’, or unknown, data points and the large datasets that are common today.
“One of the challenges of the big-data era is to organize out-of-sample data using a machine learning model based on ‘in-sample’, or known, observational data,” explains Peng who, with his colleagues, has proposed three methods as part of a unified framework to tackle this issue. These methods differ in how they implement representation learning; one focuses on sparsity, while the other two focus on low rank and grouping effects. “By solving the large-scale data and out-of-sample clustering problems, our method makes big-data clustering and online learning possible,” notes Peng.
The framework devised by the team splits input data into ‘in-sample’ data or ‘out-of-sample’ data during an initial ‘sampling’ step. Next, the in-sample data is grouped into subspaces during the ‘clustering’ step, after which the out-of-sample data is assigned to the nearest subspace. These points are then designated as cluster members.
The team tested their approach on a range of datasets including different types of information, from facial images to text — both handwritten and digital — poker hands and forest coverage. They found that their methods outperformed existing algorithms and successfully reduced the computational complexity (and hence running time) of the task while still ensuring cluster quality.
Learn more: Thinking outside the sample
The Latest on: Big data
via Google News
The Latest on: Big data
- Big Data, Big Risks: Addressing the High-Tech & Telecoms Threat Landscapeon January 24, 2020 at 5:04 am
Both industry and society have adopted a data-driven culture in which information drives intelligent decision making and previously unheard-of efficiencies. High-tech and telecoms organizations stand ...
- Rockefeller, Mastercard team up to leverage data science for social impacton January 24, 2020 at 5:00 am
... that while organizations were on board with the broad goal of helping the social sector be more data science-driven, they were more interested in getting support for specific initiatives.
- Inside Ping’s decision to embrace big data and make its clubs smarteron January 24, 2020 at 4:15 am
So when I finally got to speak to Sal, I realized he was onto something right away.” The idea of using data to make smarter decisions clicked instantly with Solheim. “It’s so much easier to improve ...
- AI & Big Data Analytics in Global Telecoms, 2019 - The AI & Big Data Analytics Ecosystem, Case Studies, Findings and Recommendationson January 24, 2020 at 4:00 am
/PRNewswire/ -- The "AI and Big Data Analytics in Telecoms: Telco Use Cases and Monetization Strategies" report has been added to ...
- Author Correction: 20 Years of DIEAP Flap Breast Reconstruction: A Big Data Analysison January 24, 2020 at 2:25 am
The original version of this Article omitted an affiliation for Louis-Philippe Kerkhove. The correct affiliations for Louis-Philippe Kerkhove are listed below: ...
- SEBI will use artificial intelligence, big data to monitor market manipulations, says Ajay Tyagion January 24, 2020 at 1:00 am
The SEBI chief added that AI and ML tools could bring about a paradigm shift in the securities market landscape.
- EHang Announces Strategic Partnership with Guiyang, the City of Global Big Data Industry, to set up its Guizhou headquarteron January 23, 2020 at 4:53 pm
It is a vital transportation hub for connecting Southwest China and the New Western Land-Sea Corridor. It is also an internationally renowned big data city, ranking among the top eight national data ...
- Businesses understand the value of big data, but employees aren't being trained to use iton January 23, 2020 at 10:06 am
A study from Accenture and data analytics firm Qlik has discovered a massive problem in the big data world: A skills gap that is costing companies billions of dollars. There is a huge amount of data ...
- Brent Spiner Reacts To That Big Data Twist In Star Trek: Picardon January 23, 2020 at 8:10 am
Zac Efron Reportedly Eyed To Play Jack Sparrow In Pirates Of... Jodie Whittaker Confirms She'll Return For Doctor Who Season... Star Wars: The Clone Wars Season 7 Features A Big Rebels Cro... Star ...
- High Customer Satisfaction Led to Teradata’s Leadership Distinction in Q4 Big Data Warehouse Landscape Report by The Information Differenceon January 23, 2020 at 7:18 am
Teradata has been recognized with the highest technology score in the Big Data Warehouse Landscape Q4 2019 report by The Information Difference.
via Bing News