Poor eyesight remains one of the main obstacles to letting robots loose among humans. But it is improving, in part by aping natural vision
ROBOTS are getting smarter and more agile all the time. They disarm bombs, fly combat missions, put together complicated machines, even play football. Why, then, one might ask, are they nowhere to be seen, beyond war zones, factories and technology fairs? One reason is that they themselves cannot see very well. And people are understandably wary of purblind contraptions bumping into them willy-nilly in the street or at home.
All that a camera-equipped computer “sees” is lots of picture elements, or pixels. A pixel is merely a number reflecting how much light has hit a particular part of a sensor. The challenge has been to devise algorithms that can interpret such numbers as scenes composed of different objects in space. This comes naturally to people and, barring certain optical illusions, takes no time at all as well as precious little conscious effort. Yet emulating this feat in computers has proved tough.
In natural vision, after an image is formed in the retina it is sent to an area at the back of the brain, called the visual cortex, for processing. The first nerve cells it passes through react only to simple stimuli, such as edges slanting at particular angles. They fire up other cells, further into the visual cortex, which react to simple combinations of edges, such as corners. Cells in each subsequent area discern ever more complex features, with those at the top of the hierarchy responding to general categories like animals and faces, and to entire scenes comprising assorted objects. All this takes less than a tenth of a second.
The outline of this process has been known for years and in the late 1980s Yann LeCun, now at New York University, pioneered an approach to computer vision that tries to mimic the hierarchical way the visual cortex is wired. He has been tweaking his “convolutional neural networks” (ConvNets) ever since.
Seeing is believing
A ConvNet begins by swiping a number of software filters, each several pixels across, over the image, pixel by pixel. Like the brain’s primary visual cortex, these filters look for simple features such as edges. The upshot is a set of feature maps, one for each filter, showing which patches of the original image contain the sought-after element. A series of transformations is then performed on each map in order to enhance it and improve the contrast. Next, the maps are swiped again, but this time rather than stopping at each pixel, the filter takes a snapshot every few pixels. That produces a new set of maps of lower resolution. These highlight the salient features while reining in computing power. The whole process is then repeated, with several hundred filters probing for more elaborate shapes rather than just a few scouring for simple ones. The resulting array of feature maps is run through one final set of filters. These classify objects into general categories, such as pedestrians or cars.
The Latest Streaming News: Computer vision updated minute-by-minute