Helping computers learn to tackle big-data problems outside their comfort zones
Imagine combing through thousands of mugshots desperately looking for a match. If time is of the essence, the faster you can do this, the better. A*STAR researchers have developed a framework that could help computers learn how to process and identify these images both faster and more accurately1.
Peng Xi of the A*STAR Institute for Infocomm Research notes that the framework can be used for numerous applications, including image segmentation, motion segmentation, data clustering, hybrid system identification and image representation.
A conventional way that computers process data is called representation learning. This involves identifying a feature that allows the program to quickly extract relevant information from the dataset and categorize it — a bit like a shortcut. Supervised and unsupervised learning are two of the main methods used in representation learning. Unlike supervised learning, which relies on costly labeling of data prior to processing, unsupervised learning involves grouping or ‘clustering’ data in a similar manner to our brains, explains Peng.
Subspace clustering is a form of unsupervised learning that seeks to fit each data point into a low-dimensional subspace to find an intrinsic simplicity that makes complex, real-world data tractable. Existing subspace clustering methods struggle to handle ‘out-of-sample’, or unknown, data points and the large datasets that are common today.
“One of the challenges of the big-data era is to organize out-of-sample data using a machine learning model based on ‘in-sample’, or known, observational data,” explains Peng who, with his colleagues, has proposed three methods as part of a unified framework to tackle this issue. These methods differ in how they implement representation learning; one focuses on sparsity, while the other two focus on low rank and grouping effects. “By solving the large-scale data and out-of-sample clustering problems, our method makes big-data clustering and online learning possible,” notes Peng.
The framework devised by the team splits input data into ‘in-sample’ data or ‘out-of-sample’ data during an initial ‘sampling’ step. Next, the in-sample data is grouped into subspaces during the ‘clustering’ step, after which the out-of-sample data is assigned to the nearest subspace. These points are then designated as cluster members.
The team tested their approach on a range of datasets including different types of information, from facial images to text — both handwritten and digital — poker hands and forest coverage. They found that their methods outperformed existing algorithms and successfully reduced the computational complexity (and hence running time) of the task while still ensuring cluster quality.
Learn more: Thinking outside the sample
The Latest on: Big data
via Google News
The Latest on: Big data
- Big Data Career Notes: September 2019 Editionon September 19, 2019 at 7:26 am
In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the big data community. Whether it’s a promotion, new company hire, or even an accolade, we’ve ...
- Culturomics: Using Big Data to Study Human Behavioron September 19, 2019 at 7:07 am
If alien anthropologists wanted to learn about human behavior, they would likely examine our literary works. Among the embarrassing flotsam, they would also discover Thomas Paine’s The Age of Reason, ...
- Engineers using soundwaves to search through big data with more stability and easeon September 19, 2019 at 6:12 am
Human beings create a lot of data in the digital age—whether it's through everyday items like social media posts, emails and Google searches, or more complex information about health, finances ...
- In big data, Sara Stevens finds big savings for CDPHPon September 19, 2019 at 4:00 am
Sara Stevens helps CDPHP find $10 million in savings each year. In her role overseeing the Albany health insurer’s data analytics, Stevens takes the tons of data the company has and crunches the ...
- Big Data in Healthcare Market Size 2019, SWOT Analysis, Industry Share, Trends Competitive Landscape, Top Key Players, and Regional Forecast to 2022on September 18, 2019 at 5:20 pm
Big data in healthcare market information: by components and services (hardware, software type, others), by software deployment type (On Premise, On Demand)), by analytic service type, and by analytic ...
- Obama weighs in on Big Data, privacy in Silicon Valley fireside chaton September 18, 2019 at 1:49 pm
The slippery slope of data has never been more crucial, former President Barack Obama warned in a far-ranging fireside chat at a tech conference in San Francisco on Wednesday. “We have such an ...
- Bloomberg Data for Good Exchange shows how big data can impact climate changeon September 18, 2019 at 11:04 am
Bloomberg's Data for Good Exchange event gave scientists and researchers a chance to show how they're using big data to save the environment. This week, data scientists, researchers and company ...
- Big Dataon September 18, 2019 at 10:21 am
Whether you’re a recent grad, seasoned IT pro or someone looking to make a career change, these bootcamps will set you on the right path for a career in data science.
- How to quiet big data noiseon September 18, 2019 at 8:32 am
Refine your big data and team to understand what insights you really need. Michael Vaughan, co-president of Regis, a business simulation and experimental learning company, recently recalled Kurt ...
- How Airlines Are (Finally) Stepping Up Their Loyalty Programs with Big Dataon September 18, 2019 at 7:00 am
Airlines are taking cues from sectors outside the travel industry, including dining, wellness, and retail to better target and engage with their passengers. The key to success is delivering ...
via Bing News