THERE is often something sweet, intimate even, about couples who finish each other’s sentences. But it can also be a source of irritation, especially when they get it wrong. A similar irritation (minus the sweetness) is often felt by users of speech-recognition software, which still manages to garble and twist even the most clearly spoken words. Perhaps the solution lies in a more intimate exchange between user and software.
Modern speech-recognition programs do not merely try to identify individual words as they are spoken; rather, they attempt to match whole chunks of speech with statistical models of phrases and sentences. The rationale is that by knowing statistical rules of thumb for the way in which words are usually put together—an abstract probabilistic approximation of grammar, if you will—it is possible to narrow the search when attempting to identify individual words. For example, a noun-phrase will typically consist of a noun preceded by a modifier, such as an article and possibly also an adjective. So if part of a speech pattern sounds like “ball”, the odds of it actually being “ball” will increase if the utterances preceding it sound like “the” and “bouncy”.
Although this so-called continuous speech-recognition approach has indeed improved accuracy, it is by no means infallible. Moreover, when it gets things wrong, it often does so spectacularly. The problem is that, as a direct consequence of this technique, the misidentification of even a single word can take the program off on a completely different path as it tries to predict what the rest of the sentence is likely to be.
Though such errors are inevitable, there may be a way to let speech-recognition programs take the pain out of making corrections. Per Ola Kristensson and Keith Vertanen, at the University of Cambridge’s Computer Laboratory, have developed a method of allowing speech-recognition programs to share their thoughts, as it were, with the user, in order to speed up the correction process. Their solution, called Parakeet, is a touch-screen-based interface for phones and other mobile devices, which not only displays the words, phrases or sentences that scored highest in the program’s statistical model, but also any close contenders. This allows the user to select alternatives easily, with a quick tap of the finger. More subtly, if none of the predicted sentences is entirely correct, yet collectively they contain the words that were spoken, the user can simply slide his finger across the appropriate words to link them up.