Helping computers learn to tackle big-data problems outside their comfort zones
Imagine combing through thousands of mugshots desperately looking for a match. If time is of the essence, the faster you can do this, the better. A*STAR researchers have developed a framework that could help computers learn how to process and identify these images both faster and more accurately1.
Peng Xi of the A*STAR Institute for Infocomm Research notes that the framework can be used for numerous applications, including image segmentation, motion segmentation, data clustering, hybrid system identification and image representation.
A conventional way that computers process data is called representation learning. This involves identifying a feature that allows the program to quickly extract relevant information from the dataset and categorize it — a bit like a shortcut. Supervised and unsupervised learning are two of the main methods used in representation learning. Unlike supervised learning, which relies on costly labeling of data prior to processing, unsupervised learning involves grouping or ‘clustering’ data in a similar manner to our brains, explains Peng.
Subspace clustering is a form of unsupervised learning that seeks to fit each data point into a low-dimensional subspace to find an intrinsic simplicity that makes complex, real-world data tractable. Existing subspace clustering methods struggle to handle ‘out-of-sample’, or unknown, data points and the large datasets that are common today.
“One of the challenges of the big-data era is to organize out-of-sample data using a machine learning model based on ‘in-sample’, or known, observational data,” explains Peng who, with his colleagues, has proposed three methods as part of a unified framework to tackle this issue. These methods differ in how they implement representation learning; one focuses on sparsity, while the other two focus on low rank and grouping effects. “By solving the large-scale data and out-of-sample clustering problems, our method makes big-data clustering and online learning possible,” notes Peng.
The framework devised by the team splits input data into ‘in-sample’ data or ‘out-of-sample’ data during an initial ‘sampling’ step. Next, the in-sample data is grouped into subspaces during the ‘clustering’ step, after which the out-of-sample data is assigned to the nearest subspace. These points are then designated as cluster members.
The team tested their approach on a range of datasets including different types of information, from facial images to text — both handwritten and digital — poker hands and forest coverage. They found that their methods outperformed existing algorithms and successfully reduced the computational complexity (and hence running time) of the task while still ensuring cluster quality.
Learn more: Thinking outside the sample
The Latest on: Big data
via Google News
The Latest on: Big data
- Protesters blocked Palantir's cafeteria to pressure the $20 billion big data company to drop its contracts with ICEon August 16, 2019 at 5:06 pm
The Coalition to Close the Concentration Camps Bay Area organized a protest at Palantir's cafeteria in Palo Alto to push it to drop its ICE contracts.
- Video: Big Data is Dead, Long Live Its Replacementon August 16, 2019 at 8:09 am
In this video, Tom Fisher presents: Big Data is Dead, Long Live Its Replacement. Big Data is experiencing a second revolution. This talk will address what’s happened, how it happened and what big data ...
- OneConnect Launches the first end-to-end Big Data-based Data Governance Solution in Chinaon August 16, 2019 at 5:08 am
GUILIN, China, Aug 15, 2019 /PRNewswire/ -- OneConnect Financial Technology Co., Ltd. ("OneConnect" or "the Company"), a leading technology-as-a-service platform for financial institutions in ...
- Data Driven Decision Making Fueling Big Data Adoptionon August 16, 2019 at 4:52 am
Big data is considered as the next huge transformation in data management and analysis. Many businesses around the world have employed big data technology in their operations to help them analyze the ...
- Big Promises, Big Data: Is the SAT’s New ‘Environmental Context’ Score a Tool to Personalize College Admissions, or Anotheron August 15, 2019 at 7:40 am
Kerr: As a Teacher, I Know That Working With Students, Not Punishing Them, Is the Best Way to Manage a School. My Colleagues Agree Linehan: Meet 7 Young Education Reformers Who Flourished Thanks to ...
- Big Data Shines Way Forward for Big Quake Predictionon August 14, 2019 at 7:21 pm
The magnitude 6.4 earthquake that struck near Ridgecrest, California on July 4 was an eye opener for many in the region. But when a much larger magnitude 7.1 quake hit the following day, it became ...
- The Controlling Power Of Big Dataon August 14, 2019 at 9:54 am
I’m working this morning on the chapter of my new book in which I talk about how the power and ubiquity of data mining, and the technology that supports it, puts us in an unprecedented position ...
- How big data and AI help online retailers compete in the digital eraon August 14, 2019 at 7:00 am
Personalized data offers e-commerce yet another advantage over brick-and-mortar retail stores. As brick-and-mortar retailers continue to struggle against online competitors, some are seeking out ...
- Artificial intelligence predicts the progression of diabetic kidney disease using big data machine learningon August 14, 2019 at 2:20 am
Artificial intelligence (AI) is expected to support clinical judgement in medicine. We constructed a new predictive model for diabetic kidney diseases (DKD) using AI, processing natural language and ...
- A ‘big data’ firm sells Cambridge Analytica’s methods to global politicians, documents showon August 14, 2019 at 12:00 am
Cambridge Analytica, the most notorious practitioner of psychological profiling, has closed shop. Their methods aren’t about to die out so easily. A pitch deck from IDEIA Big Data, an ...
via Bing News