Helping computers learn to tackle big-data problems outside their comfort zones
Imagine combing through thousands of mugshots desperately looking for a match. If time is of the essence, the faster you can do this, the better. A*STAR researchers have developed a framework that could help computers learn how to process and identify these images both faster and more accurately1.
Peng Xi of the A*STAR Institute for Infocomm Research notes that the framework can be used for numerous applications, including image segmentation, motion segmentation, data clustering, hybrid system identification and image representation.
A conventional way that computers process data is called representation learning. This involves identifying a feature that allows the program to quickly extract relevant information from the dataset and categorize it — a bit like a shortcut. Supervised and unsupervised learning are two of the main methods used in representation learning. Unlike supervised learning, which relies on costly labeling of data prior to processing, unsupervised learning involves grouping or ‘clustering’ data in a similar manner to our brains, explains Peng.
Subspace clustering is a form of unsupervised learning that seeks to fit each data point into a low-dimensional subspace to find an intrinsic simplicity that makes complex, real-world data tractable. Existing subspace clustering methods struggle to handle ‘out-of-sample’, or unknown, data points and the large datasets that are common today.
“One of the challenges of the big-data era is to organize out-of-sample data using a machine learning model based on ‘in-sample’, or known, observational data,” explains Peng who, with his colleagues, has proposed three methods as part of a unified framework to tackle this issue. These methods differ in how they implement representation learning; one focuses on sparsity, while the other two focus on low rank and grouping effects. “By solving the large-scale data and out-of-sample clustering problems, our method makes big-data clustering and online learning possible,” notes Peng.
The framework devised by the team splits input data into ‘in-sample’ data or ‘out-of-sample’ data during an initial ‘sampling’ step. Next, the in-sample data is grouped into subspaces during the ‘clustering’ step, after which the out-of-sample data is assigned to the nearest subspace. These points are then designated as cluster members.
The team tested their approach on a range of datasets including different types of information, from facial images to text — both handwritten and digital — poker hands and forest coverage. They found that their methods outperformed existing algorithms and successfully reduced the computational complexity (and hence running time) of the task while still ensuring cluster quality.
Learn more: Thinking outside the sample
The Latest on: Big data
via Google News
The Latest on: Big data
- The Use and Value of Big Data Network Analytics and AI for Infrastructure Management on January 12, 2019 at 11:25 am
As enterprise access networks evolve with more mobile users, diverse devices and cloud-based applications, managing user performance on an end to end basis has become next to impossible. Recent advanc... […]
- The Power of Data Is Stuck In Silos on January 11, 2019 at 7:51 pm
So, how valuable are your cloud efforts if decision makers cannot quickly find and mobilize trusted data crucial to big data analytics, security, compliance and other key functions? When can we get to ... […]
- Big Data's "theory-free" analysis is a statistical malpractice on January 11, 2019 at 1:24 pm
One of the premises of Big Data is that it can be "theory free": rather than starting with a hypothesis ("men at buffets eat more when women are present," "more people will click this button if I ... […]
- Worldwide Pharmaceuticals to 2021: Drifting Towards Big Data & Analytics - ResearchAndMarkets.com on January 11, 2019 at 8:14 am
The "Pharmaceuticals: Drifting towards Big Data & Analytics - Drivers, challenges and technology developments" report has been added to ResearchAndMarkets.com's offering. Pharmaceuticals: Drifting tow... […]
- Pharmaceuticals Markets, 2018: Drifting Towards Big Data & Analytics - Drivers, Challenges and Technology Developments on January 11, 2019 at 6:11 am
Dublin, Jan. 11, 2019 (GLOBE NEWSWIRE) -- The "Pharmaceuticals: Drifting towards Big Data & Analytics - Drivers, challenges and technology developments" report has been added to ResearchAndMarkets.com ... […]
- Global Big Data in Military market detailed in new research report on January 11, 2019 at 2:07 am
Our expert research analysts have been trained to map client’s research requirements to the correct research resource leading to a distinctive edge over its competitors. We provide intellectual, preci... […]
- Using 'Big Data' And AI To Understand The Patterns Of Our History And Tell Us About Our Future on January 10, 2019 at 6:41 pm
One of the driving forces behind much of my work over the past quarter century has been how we can use massive datasets and computing platforms to help us understand global society, from the patterns ... […]
- Big Data Helps Predict Which Brick-and-Mortar Locations Will Thrive and Fail on January 10, 2019 at 6:15 am
The Gap’s recent decision to close its Fifth Avenue store in New York City hardly came as a surprise to anyone who has been following the retail giant this past year. With plans to close hundreds more ... […]
- Worldwide Big Data Spending in Healthcare Market to See major Growth 2019-2025 on January 10, 2019 at 4:06 am
A new market study, titled “Global Big Data Spending in Healthcare Market Size, Status and Forecast 2019-2025” has been featured on WiseGuyReports. Big data solutions are implemented to handle the mas... […]
- 5 Free Online Courses to Learn Big Data, Hadoop, and Spark in 2019 on January 9, 2019 at 11:04 pm
If you want to learn Big Data technologies in 2019 like Hadoop, Apache Spark, and Apache Kafka and you are looking for some free resources e.g. books, courses, and tutorials then you have come to ... […]
via Bing News