Jun 272014

Transduction (machine learning) (Photo credit: Wikipedia)

Algorithm lets independent agents collectively produce a machine-learning model without aggregating data.

Machine learning, in which computers learn new skills by looking for patterns in training data, is the basis of most recent advances in artificial intelligence, from voice-recognition systems to self-parking cars. It’s also the technique that autonomous robots typically use to build models of their environments.

That type of model-building gets complicated, however, in cases in which clusters of robots work as teams. The robots may have gathered information that, collectively, would produce a good model but which, individually, is almost useless. If constraints on power, communication, or computation mean that the robots can’t pool their data at one location, how can they collectively build a model?

At the Uncertainty in Artificial Intelligence conference in July, researchers from MIT’s Laboratory for Information and Decision Systems will answer that question. They present an algorithm in which distributed agents — such as robots exploring a building — collect data and analyze it independently. Pairs of agents, such as robots passing each other in the hall, then exchange analyses.

In experiments involving several different data sets, the researchers’ distributed algorithm actually outperformed a standard algorithm that works on data aggregated at a single location.

Take me to the complete story . . .


The Latest on: Robot learning

Latest NEWS


Latest VIDEOs


The Latest from the BLOGOSPHERE

Aug 022013


A new approach allows “smart” machines to understand sounds other than speech

Robots can already discern and react to speech thanks to voice-recognition software such as the iPhone’s Siri. But “smart” machines still struggle with most other sounds. “In some sense, it’s almost a simpler problem, but there hasn’t been a lot of work on noise in the environment,” says roboticist Joseph Romano of Rethink Robotics in Boston. “It hasn’t been in the loop for robotic feedback.”

Now Romano is letting robots listen in on more than our conversations. He and his collaborators at the University of Pennsylvania have created a software tool called ROAR (short for robotic operating system open-source audio recognizer) that allows roboticists to train machines to respond to a much wider range of sounds. As described in a recent issue of Autonomous Robots, the tool’s chief requirement is a microphone.

To begin training, the robot’s microphone first captures ambient sounds, which ROAR scrubs of noisy static. Next the operator teaches ROAR to recognize key sounds by repeatedly performing a specific action—such as shutting a door or setting off a smartphone alarm—and tagging the unique audio signature while the robot listens. Finally, the program creates a general model of the sound of each action from that set of training clips.

Read more . . .

via Scientific American

The Latest Streaming News: Gives Robots the Gift of Hearing updated minute-by-minute

Bookmark this page and come back often

Latest NEWS


Latest VIDEO


The Latest from the BLOGOSPHERE

Jan 072013
Simultaneous translation by computer is getting closer

IN “STAR TREK”, a television series of the 1960s, no matter how far across the universe the Starship Enterprise travelled, any aliens it encountered would converse in fluent Californian English. It was explained that Captain Kirk and his crew wore tiny, computerised Universal Translators that could scan alien brainwaves and simultaneously convert their concepts into appropriate English words.

Science fiction, of course. But the best sci-fi has a habit of presaging fact. Many believe the flip-open communicators also seen in that first “Star Trek” series inspired the design of clamshell mobile phones. And, on a more sinister note, several armies and military-equipment firms are working on high-energy laser weapons that bear a striking resemblance to phasers. How long, then, before automatic simultaneous translation becomes the norm, and all those tedious language lessons at school are declared redundant?

Not, perhaps, as long as language teachers, interpreters and others who make their living from mutual incomprehension might like. A series of announcements over the past few months from sources as varied as mighty Microsoft and string-and-sealing-wax private inventors suggest that workable, if not yet perfect, simultaneous-translation devices are now close at hand.

Over the summer, Will Powell, an inventor in London, demonstrated a system that translates both sides of a conversation between English and Spanish speakers—if they are patient, and speak slowly. Each interlocutor wears a hands-free headset linked to a mobile phone, and sports special goggles that display the translated text like subtitles in a foreign film.

In November, NTT DoCoMo, the largest mobile-phone operator in Japan, introduced a service that translates phone calls between Japanese and English, Chinese or Korean. Each party speaks consecutively, with the firm’s computers eavesdropping and translating his words in a matter of seconds. The result is then spoken in a man’s or woman’s voice, as appropriate.

Microsoft’s contribution is perhaps the most beguiling. When Rick Rashid, the firm’s chief research officer, spoke in English at a conference in Tianjin in October, his peroration was translated live into Mandarin, appearing first as subtitles on overhead video screens, and then as a computer-generated voice. Remarkably, the Chinese version of Mr Rashid’s speech shared the characteristic tones and inflections of his own voice.

Read more . . .

via The Economist

The Latest Streaming News: Machine Translation updated minute-by-minute

Bookmark this page and come back often

Latest NEWS


Latest VIDEO


The Latest from the BLOGOSPHERE

Dec 202012
Research at the University of Gothenburg and Chalmers University of Technology has resulted in a new type of machine that sorts used batteries by means of artificial intelligence (AI).

One machine is now being used in the UK, sorting one-third of the country’s recycled batteries.

‘I got the idea at home when I was sorting rubbish. I thought it should be possible to do it automatically with artificial intelligence,’ says Claes Strannegård, who is an AI researcher at the University of Gothenburg and Chalmers University of Technology.

Strannegård contacted the publically owned recycling company Renova in Gothenburg, who were positive to an R&D project concerning automatic sorting of collected batteries. The collaboration resulted in a machine that uses computerised optical recognition to sort up to ten batteries per second.

The sorting is made possible by the machine’s so-called neural network, which can be thought of as an artificial nervous system. Just like a human brain, the neural network must be trained to do what it is supposed to do. In this case, the machine has been trained to recognise about 2,000 different types of batteries by taking pictures of them from all possible angles.

As the batteries are fed into the machine via a conveyor belt, they are ‘visually inspected’ by the machine via a camera. The neural network identifies the batteries in just a few milliseconds by comparing the picture taken with pictures taken earlier. The network is self-learning and robust, making it possible to recognise batteries even if they are dirty or damaged. Once the batteries have been identified, compressed air separates them into different containers according to chemical content, such as nickel-cadmium or lithium.

‘For each single battery, the system stores and spits out information about for example brand, model and type. This allows the recycler to tell a larger market exactly what types of material it can offer, which we believe may increase the value through increased competition,’ says Hans-Eric Melin, CEO of the Gothenburg-based company Optisort, which has developed the machine.
This means that besides the environmental benefits of the machine, there are commercial benefits. Today the collection and sorting companies are actually paying money to get rid of the batteries. But Melin thinks that real-time battery data could spark a new market for battery waste, where large volumes are traded online.

Read more . . .

via University of Gothenburg – THOMAS MELIN

The Latest Streaming News: Sorting with artificial intelligence updated minute-by-minute

Bookmark this page and come back often

Latest NEWS


Latest VIDEO


The Latest from the BLOGOSPHERE

Apr 022012
Nuance Communications

Nuance Communications (Photo credit: Wikipedia)

A voice-enabled future

VLAD SEJNOHA is talking to the TV again.

O.K., maybe you’ve done that, too. But here’s the weird thing: His TV is listening.

“Dragon TV,” Mr. Sejnoha says to the screen, “find movies with Meryl Streep.” Up pops a list of films like “Out of Africa” and “It’s Complicated.”

“Dragon TV, change to CNN,” he says. Presto — the channel flips to CNN.

Mr. Sejnoha is sitting in what looks like a living room but is, in fact, a sort of laboratory inside Nuance Communications, the leading force in voice technology, and the speech-recognition engine behind Siri, the virtual personal assistant on the Apple iPhone 4S.

Here, Mr. Sejnoha, the company’s chief technology officer, and other executives are plotting a voice-enabled future where human speech brings responses from not only smartphones and televisions, cars and computers, but also coffee makers, refrigerators, thermostats, alarm systems and other smart devices and appliances.

It is a wildly disruptive idea. But such systems are already beginning to change the way we interact with the world and, for better and worse, how we think about technology. Until now, after all, we’ve talked only to one another. What if we begin talking to all sorts of machines, too — and, like Siri, those machines respond as if they were human?

Granted, people have been talking into machines and at machines since the days of Edison’s phonograph. By the 1980s, commercial speech recognition systems had become sophisticated enough to transcribe spoken words into text. Today, voice technology is a fixture of many companies’ customer-service operations, albeit an occasionally maddening one.

But now the race is on to make the voice the sought-after new interface between us and our technology. The results could rival innovations like the computer mouse and the graphic icon and, some experts say, eventually pose challenges for giants like Google by bypassing their traditional search engines.

Read more . . .

via New York Times – NATASHA SINGER

Bookmark this page for “speech recognition” and check back regularly as these articles update on a very frequent basis. The view is set to “news”. Try clicking on “video” and “2” for more articles.


Feb 182012

Utter! is able to process your various requests and send them to different apps

New app Utter!, currently in beta, uses voice recognition to utilize the functionality of apps already installed on your smartphone. This is a different approach to other software and apps like the iPhone 4S’s voice assistant Siri, which use their own internal functionality to provide search results. As a result, Utter! can not only send messages and set calendar events but can also report on your phone’s battery life, reboot into the bootloader, change what your processor is clocked at, and much more.

Utter! is able to process your various requests and send them to different apps in order to retrieve answers, which should enable you to ask it a wider variety of queries.

Bookmark this page for “speech recognition” and check back regularly as these articles update on a very frequent basis. The view is set to “news”. Try clicking on “video” and “2” for more articles.

Dec 122011

Getting prepared to work with autistic children and the elderly

Remember NAO, the robot that stole the show at the recent Robotville event? Well, NAO’s already impressive set of abilities have just been extended with Aldebaran Robotics releasing a new version of its cute little humanoid robot. Around two thousand NAOs are used for research and education purposes all around the world but now that the NAO Next Gen is ready, the founder and chairman of Aldebaran Robotics, Bruno Maisonnier, hopes to see it become useful to humans in a more direct sense. It’s new abilities are to make it even more versatile and, among other things, prepare it for working with autistic children and the elderly.

We covered NAO in detail in 2009, and since then not much has changed on the outside. We are glad this is the case because NAO could not get more likable. However, the interesting part is what’s hiding under the hood. For one thing, the Next Gen now has more computing power at its disposal and handles multitasking much better thanks to an on-board 1.6 GHz Atom processor. Also, NAO’s vision has received an upgrade with two HD cameras and the ability to process two video streams simultaneously. This improves face and object recognition even under changing lighting conditions. There is also a sonar distance sensor, two infrared emitters and receivers, nine tactile and eight pressure sensors. All in all, a pretty impressive set of tools that enable the robot to better navigate its environment.

NAO Next Gen has four microphones that allow it to pinpoint where a voice command, or any other noise for that matter, is coming from. New voice recognition software called Nuance coupled with a “word spotting” functionality allows the robot to recognize sentences or isolate a single word from a whole string of words in a sentence. Add a text to speech capability and fluency in 8 languages and NAO becomes an interesting partner for discussions.

And this is not the end of software related advancements. The new NAO now boasts a system to prevent limb/body collisions, an improved torque control mechanism and an adaptive walking algorithm. Should the 23 inch (59 cm) tall humanoid robot be pushed off balance, the fall will be automatically cushioned. Once a worrying shift in NAO’s center of mass is detected, all motion-related actions are instantly put on hold and the robot uses its limbs to protect itself against the impact, just like a human would.

Read more . . .

Bookmark this page for “robots elderly children” and check back regularly as these articles update on a very frequent basis. The view is set to “news”. Try clicking on “video” and “2” for more articles.

Oct 132011
iPhone 4S

Image by tenz1225 via Flickr

What’s in a name?

A lot, apparently. Apple’s new iPhone is called the iPhone 4S. But what people really wanted was the iPhone 5.

The rumors online had predicted the second coming — or, rather, the fifth coming. It would be wedge-shaped! It would be completely transparent! It would clean your basement, pick you up at the airport and eliminate unsightly blemishes!

Instead, what showed up was a new iPhone that looks just like the last one: black or white, glass front and back, silver metal band around the sides. And on paper, at least, the new phone does only four new things.

THING 1: There’s a faster chip, the same one that’s in the iPad 2. More speed is always better, of course. But it’s not like people were complaining about the previous iPhone’s speed.

THING 2: A much better, faster camera — among the best on a phone. It has a resolution of eight megapixels, which doesn’t matter much, and a new, more light-sensitive sensor, which does. Its photos are crisp and clear, with beautiful color. The low-light photos and 1080p high-definition video are especially impressive for a phone. There’s still no zoom and only a tiny LED flash — but otherwise, this phone comes dangerously close to displacing a $200 point-and-shoot digital camera.

THING 3: The iPhone 4S is a world phone. As of Friday, you will be able to buy it from AT&T, Verizon and, for the first time, Sprint ($200, $300 or $400 for the 16-, 32- or 64-gigabyte models). But even if you get your iPhone 4S from Verizon, whose CDMA network is incompatible with the GSM networks used in most other countries, you’ll still be able to make calls overseas, either through Verizon or by inserting another carrier’s SIM card. Call ahead for details.

Each carrier has its selling points. Sprint is the only one with an unlimited iPhone data plan (example: $110 a month for unlimited calling, texting and Internet). AT&T says it has the fastest download speeds. But if you care about calling coverage, Verizon is the way to go.

THING 4: Speech recognition. Crazy good, transformative, category-redefining speech recognition.

Exactly as on Android phones, a tiny microphone button appears on the on-screen keyboard; whenever you have an Internet connection, you can tap it when you want to dictate instead of typing. After a moment, the transcription appears. The sometimes frustrating on-screen keyboard is now a glorified Plan B.

Apple won’t admit that it’s using a version of Dragon Dictation, the free iPhone app, but there doesn’t seem to be much doubt; it works and behaves identically. (For example, it occasionally seems to process your utterance but then types nothing at all, just as the Dragon app does.) This version is infinitely better, though, because it’s a built-in keyboard button, not a separate app.

But dictation is only half the story — no, one-tenth of the story. Because in 2010, Apple bought a start-up called Siri, whose technology it has baked into the iPhone 4S.

Siri is billed as a virtual assistant: a crisply accurate, astonishingly understanding, uncomplaining, voice-commanded minion. No voice training or special syntax is required; you don’t even have to hold the phone up to your head. You just hold down the phone’s Home button until you hear a double beep, and then speak casually.

You can say, “Wake me up at 7:35,” or “Change my 7:35 alarm to 8.” You can say, “What’s Gary’s work number?” Or, “How do I get to the airport?” Or, “Any good Thai restaurants around here?” Or, “Make a note to rent ‘Ishtar’ this weekend.” Or, “How many days until Valentine’s Day?” Or, “Play some Beatles.” Or, “When was Abraham Lincoln born?”

In each case, Siri thinks for a few seconds, displays a beautifully formatted response and speaks in a calm female voice.

It’s mind-blowing how inexact your utterances can be. Siri understands everything from, “What’s the weather going to be like in Tucson this weekend?” to “Will I need an umbrella tonight?” (She has various amusing responses for “What is the meaning of life?”)

Read more . . .

Enhanced by Zemanta
Sep 292011

The Lonely Planet Offline Translator

The Lonely Planet Offline Translator apps, which are initially being launched in eight languages, is essentially a rebranding of the Jibbigo mobile language translation apps that were first released in September, 2009 with a Spanish-English translator app. Drawing on a libraries of over 40,000 travel-centric words, the apps allow users to enter words, phrases and whole sentences either through typing in text or speaking into the device’s microphone.

Like all current speech recognition apps, results will vary and work best in a quiet environment. In my short time with the French-English app the speech recognition generally works pretty well, successfully recognizing the vast majority of words most of the time. However, there are times when the app struggles with a particular word or phrase which won’t be recognized no matter how many times it is repeated. Keeping the phrases simple and travel related definitely helps in this regard. For those words incorrectly identified it’s easy to select the relevant word and type in the correct one. Oft-used names and places can also be added to the dictionary to improve accuracy.

The apps display the original language words or phrase in the top window and the translation in the window below, speaking the translation so you don’t have to embarrass yourself with faltering pronunciation. There’s also a searchable dictionary for looking up individual words and the apps are also bi-directional providing translations both to and from English to the app’s other language.

Because the libraries are stored on the mobile device, no Wi-Fi or 3G connectivity is needed, but results in the apps taking up a fair chunk of memory on your mobile device. The iOS apps range from 163 MB for the Tagalog-English language app, up to 276 MB for the Iraqi-English app.

Read more . . .

Bookmark this page for “translator app” and check back regularly as these articles update on a very frequent basis. The view is set to “news”. Try clicking on “video” and “2” for more articles.


Apr 132011
Empty cocktail glass. No copyright.

Image via Wikipedia

You have little trouble hearing what your companion is saying in a noisy cafe, but computers are confounded by this “cocktail party problem.”

New algorithms finally enable machines to tune in to the right speaker, sometimes even better than humans can.

The year is 1974, and Harry Caul is monitoring a couple walking through a crowded Union Square in San Francisco. He uses shotgun microphones to secretly record their conversation, but at a critical point, a nearby percussion band drowns out the conversation. Ultimately Harry has to use an improbable gadget to extract the nearly inaudible words, “He’d kill us if he got the chance,” from the recordings.

This piece of audio forensics was science fiction when it appeared in the movie The Conversation more than three decades ago. Is it possible today?

Sorting out the babble from multiple conversations is popularly known as the “cocktail party problem,” and researchers have made many inroads toward solving it in the past 10 years. Human listeners can selectively tune out all but the speaker of interest when multiple speakers are talking. Unlike people, machines have been notoriously unreliable at recognizing speech in the presence of noise, especially when the noise is background speech. Speech recognition technology is becoming increasingly ubiquitous and is now being used for dictating text and commands to computers, phones and GPS devices. But good luck getting anything but gibberish if two people speak at once.

A flurry of recent research has focused on the cocktail party problem. In 2006, Martin Cooke of the University of Sheffield in England and Te-Won Lee of the University of California, San Diego, organized a speech separation “challenge,” a task designed to compare different approaches to separating and recognizing the mixed speech of two talkers. Since then, researchers around the world have built systems to compete against one another and against the ultimate benchmark: human listeners.

Read more . . .


Enhanced by Zemanta
Jul 142010
University of Cambridge
Image via Wikipedia

THERE is often something sweet, intimate even, about couples who finish each other’s sentences. But it can also be a source of irritation, especially when they get it wrong. A similar irritation (minus the sweetness) is often felt by users of speech-recognition software, which still manages to garble and twist even the most clearly spoken words. Perhaps the solution lies in a more intimate exchange between user and software.

Modern speech-recognition programs do not merely try to identify individual words as they are spoken; rather, they attempt to match whole chunks of speech with statistical models of phrases and sentences. The rationale is that by knowing statistical rules of thumb for the way in which words are usually put together—an abstract probabilistic approximation of grammar, if you will—it is possible to narrow the search when attempting to identify individual words. For example, a noun-phrase will typically consist of a noun preceded by a modifier, such as an article and possibly also an adjective. So if part of a speech pattern sounds like “ball”, the odds of it actually being “ball” will increase if the utterances preceding it sound like “the” and “bouncy”.

Although this so-called continuous speech-recognition approach has indeed improved accuracy, it is by no means infallible. Moreover, when it gets things wrong, it often does so spectacularly. The problem is that, as a direct consequence of this technique, the misidentification of even a single word can take the program off on a completely different path as it tries to predict what the rest of the sentence is likely to be.

Though such errors are inevitable, there may be a way to let speech-recognition programs take the pain out of making corrections. Per Ola Kristensson and Keith Vertanen, at the University of Cambridge’s Computer Laboratory, have developed a method of allowing speech-recognition programs to share their thoughts, as it were, with the user, in order to speed up the correction process. Their solution, called Parakeet, is a touch-screen-based interface for phones and other mobile devices, which not only displays the words, phrases or sentences that scored highest in the program’s statistical model, but also any close contenders. This allows the user to select alternatives easily, with a quick tap of the finger. More subtly, if none of the predicted sentences is entirely correct, yet collectively they contain the words that were spoken, the user can simply slide his finger across the appropriate words to link them up.

Read more . . .

Enhanced by Zemanta
Jun 212010
Image representing Swype as depicted in CrunchBase
Image via CrunchBase

Back in the 1990s, typing out “hello” on most cellphones required an exhausting 13 taps on the number keys, like so: 44-33-555-555-666.

That was before the inventor Cliff Kushler, based here in Seattle, and a partner created software called T9, which could bring that number down to three by guessing the word being typed.

Now there is a new challenge to typing on phones. More phones are using virtual keyboards on a touch screen, replacing physical buttons. But pecking out a message on a small piece of glass is not so easy, and typos are common.

Mr. Kushler thinks he has a solution once again. His new technology, which he developed with a fellow research scientist, Randy Marsden, is called Swype, and it allows users to glide a finger across the virtual keyboard to spell words, rather than tapping out each letter.

While many smartphones have features that auto-complete words, correct typos on the fly and add punctuation, Mr. Kushler is aiming for the next level.

“We’ve squeezed the desktop computer, complete with keyboard and mouse, into something that fits in a pocket. The information bandwidth has become very constricted,” he said. “I thought, if we can find a better way to input that information, it could be something that would really take off.”

Mr. Kushler says Swype is a big breakthrough that could reach billions of people. That’s not as ambitious as it sounds. To date, the T9 technology has been built into more than four billion devices worldwide. In 1999 its creators sold it to AOL for a reported $350 million; it is now owned by the speech-recognition company Nuance.

Swype’s software detects where a finger pauses and changes direction as it traces out the pattern of a word. The movements do not have to be precise because the software calculates which words a user is most likely trying to spell.

Capitalization and double letters can be indicated with a pause or squiggle, while spacing and punctuation are automatic. Mr. Kushler, who is chief technology officer of Swype, estimates that the software can improve even the nimblest text-messager’s pace by 20 to 30 percent.

Swype is now being used on seven smartphones in the United States, across all major wireless carriers, including the HTC HD2 and the Samsung Omnia II. By the end of the year, the company says its software will be on more than 50 models worldwide.

It does not have a deal with Apple, the king of touch-screen phones, but it is tinkering with software for the iPhone and the iPad and hopes to show it to Apple soon.

Read more . . .

Enhanced by Zemanta