Jul 182013


The result of the change to the basic algorithm is a speed up of around 20,000 times, which is astounding.

Google Research has just released details of a Machine Vision technique which might bring high power visual recognition to simple desktop and even mobile computers. It claims to be able to recognize 100,000 different types of object within a photo in a few minutes – and there isn’t a deepneural network DNN mentioned.

There has always been a basic split in machine vision work. The engineering approach tries to solve the problem by treating it as a signal detection task using standard engineering techniques. The more “soft” approach has been to try to build systems that are more like the way humans do things. Recently it has been this human approach that seems to have been on top, with DNNs managing to learn to recognize important features in sample videos. This is very impressive and very important, but as is often the case the engineering approach also has a trick or two up its sleeve.

In this case we have improvements to the fairly standard technique of applying convolutional filters to an image to pick out objects of interest. The big problem with convolutional filters is that you need at least one per objecttype you are looking for – there has to be a cat filter, a dog filter, a human filter and so on. Given that the time it takes to apply a filter doesn’t scale well with image size, most approaches that use this method are limited to a small number of categories of object.

This year’s winner of the CVPR Best Paper Award, co-authored by Googlers Tom Dean, Mark Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan and Jay Yagnik, describes technology that speeds things up so that many thousands of object categories can be used and the results can be produced in a few minutes with a standard computer.

The technique is complicated, but in essence it makes use of hashing to avoid having to compute everything each time. Locality sensitive hashing is use to lookup the results of each step of the convolution – that is, instead of applying a mask to the pixels and summing the result, the pixels are hashed and then used as a lookup in a table of results. They also use a rank ordering method which indicates which filter is likely to be the best match for further evaluation. The use of ordinal convolution to replace linear convolution seems to be as important as the use of hashing.

The result of the change to the basic algorithm is a speed up of around 20,000 times, which is astounding.

Read more . . .

via I Programmer

The Latest Streaming News: Machine Vision Breakthrough updated minute-by-minute

Bookmark this page and come back often

Latest NEWS


Latest VIDEO


The Latest from the BLOGOSPHERE

Other Interesting Posts

Leave a Reply

%d bloggers like this: