Linguists, computer scientists use TACC supercomputers to improve natural language processing
It’s not hard to tell the difference between the “charge” of a battery and criminal “charges.” But for computers, distinguishing between the various meanings of a word is difficult.
For more than 50 years, linguists and computer scientists have tried to get computers to understand human language by programming semantics as software. Driven initially by efforts to translate Russian scientific texts during the Cold War (and more recently by the value of information retrieval and data analysis tools), these efforts have met with mixed success. IBM’s Jeopardy-winning Watson system and Google Translate are high profile, successful applications of language technologies, but the humorous answers and mistranslations they sometimes produce are evidence of the continuing difficulty of the problem.
Our ability to easily distinguish between multiple word meanings is rooted in a lifetime of experience. Using the context in which a word is used, an intrinsic understanding of syntax and logic, and a sense of the speaker’s intention, we intuit what another person is telling us.
“In the past, people have tried to hand-code all of this knowledge,” explained Katrin Erk, a professor of linguistics at The University of Texas at Austin focusing on lexical semantics. “I think it’s fair to say that this hasn’t been successful. There are just too many little things that humans know.”
Other efforts have tried to use dictionary meanings to train computers to better understand language, but these attempts have also faced obstacles. Dictionaries have their own sense distinctions, which are crystal clear to the dictionary-maker but murky to the dictionary reader. Moreover, no two dictionaries provide the same set of meanings — frustrating, right?
Watching annotators struggle to make sense of conflicting definitions led Erk to try a different tactic. Instead of hard-coding human logic or deciphering dictionaries, why not mine a vast body of texts (which are a reflection of human knowledge) and use the implicit connections between the words to create a weighted map of relationships — a dictionary without a dictionary?
“An intuition for me was that you could visualize the different meanings of a word as points in space,” she said. “You could think of them as sometimes far apart, like a battery charge and criminal charges, and sometimes close together, like criminal charges and accusations (“the newspaper published charges…”). The meaning of a word in a particular context is a point in this space. Then we don’t have to say how many senses a word has. Instead we say: ‘This use of the word is close to this usage in another sentence, but far away from the third use.'”
To create a model that can accurately recreate the intuitive ability to distinguish word meaning requires a lot of text and a lot of analytical horsepower.
“The lower end for this kind of a research is a text collection of 100 million words,” she explained. “If you can give me a few billion words, I’d be much happier. But how can we process all of that information? That’s where supercomputers and Hadoop come in.”
Applying Computational Horsepower
Erk initially conducted her research on desktop computers, but around 2009, she began using the parallel computing systems at the Texas Advanced Computing Center (TACC). Access to a special Hadoop-optimized subsystem on TACC’s Longhorn supercomputer allowed Erk and her collaborators to expand the scope of their research. Hadoop is a software architecture well suited to text analysis and the data mining of unstructured data that can also take advantage of large computer clusters. Computational models that take weeks to run on a desktop computer can run in hours on Longhorn. This opened up new possibilities.
“In a simple case we count how often a word occurs in close proximity to other words. If you’re doing this with one billion words, do you have a couple of days to wait to do the computation? It’s no fun,” Erk said. “With Hadoop on Longhorn, we could get the kind of data that we need to do language processing much faster. That enabled us to use larger amounts of data and develop better models.”
Treating words in a relational, non-fixed way corresponds to emerging psychological notions of how the mind deals with language and concepts in general, according to Erk. Instead of rigid definitions, concepts have “fuzzy boundaries” where the meaning, value and limits of the idea can vary considerably according to the context or conditions. Erk takes this idea of language and recreates a model of it from hundreds of thousands of documents.
The Latest Bing News on:
When Will My Computer Understand Me?
- “I thought my wild ’90s life had finally caught up with me. Then I learned I wasn’t alone”: Meg Mathews on the impact of menopauseon October 13, 2020 at 4:03 pm
To the outside world, Meg Mathews had built the perfect life since her heyday as Britpop royalty – with a beautiful daughter, a loving partner, and a successful career, she had gone from newspaper ...
- Alonso says Renault car "outperforming me at the moment"on October 13, 2020 at 7:52 am
Fernando Alonso admitted that "the car is outperforming me at the moment" after making his Formula 1 return with Renault in Barcelona on Tuesday.
- ‘A part of me died when he died’: Families recall loved ones killed on USS Coleon October 9, 2020 at 10:14 pm
The first time was two weeks after the terrorist attack on USS Cole, which killed her son and 16 shipmates 20 years ago. A few weeks later, the Navy gave her Timothy Lee Gauna’s ashes. She gave them ...
- Doyel: You're not really going to chew that ice near me, are you?on October 8, 2020 at 4:58 am
Some people cannot stand the sound of chewing, an issue called misophonia that at times can cause a flight-or-fight response. I'm one of those people.
- Come Ride With Meon October 7, 2020 at 3:37 am
I’ve had more run ins with racist people than I have ever before in my life in the past few weeks. All because I spoke up.
- Is My Workplace Watching Me?on October 1, 2020 at 5:00 am
As we adjust to the work-from-home life, some worry employers might take, um, more liberties when it comes to our private data.
- The Galaxy Z Fold2 Has Convinced Me That Foldables Are the Futureon September 28, 2020 at 9:34 am
I’m probably 30 percent faster with a real computer, but again, this is a phone/tablet/thing that fits in my pocket ... but I can understand why — this is a phone from the future.
- John Robson: Do I really need a watch to tell me I'm still alive?on September 22, 2020 at 10:41 am
Getting dizzy over features we don’t understand on things ... object around me tells me it is later than I think plus I have a mighty miniature computer in my pocket linked via satellites ...
- Software engineering alumnus expands his computer science education programon September 17, 2020 at 8:49 pm
By the end of April, he created Codubee, a skill-building platform designed to mimic the experience of computer science internships ... “This definitely gave me a boost in that direction, because when ...
- 'Exponential increase in brain-computer interface technology will result in Singularity'on September 14, 2020 at 9:53 am
Thirty-five years ago, Fusion Threshold author Ronald Sones wondered "Suppose that instead of entering information into a computer through ... as mentioned in my manuscript, quoting Alvin Toffler ...
The Latest Google Headlines on:
When Will My Computer Understand Me?
The Latest Bing News on:
Computers understand human language
- StreamlineMD Selects Aidéo Technologies’ AI-Powered Auto-Coding Tool for Revenue Cycle ...on October 13, 2020 at 10:18 am
(GLOBE NEWSWIRE) -- Aidéo Technologies – a leader in AI-enabled automation technology for the healthcare industry – announced today that StreamlineMD has chosen the company’s AI-powered auto-coding ...
- What Is a Compiler, Anyway?on October 12, 2020 at 6:14 am
We still have judgement here, that we but teach bloody instructions which, being taught, return to plague th’inventor.” – Macbeth, 1.7 Today we dive into Computer Programming 101. Computers ...
- Facebook, WIMI and Tesla usher in new growth opportunities of Brain-Computer interface + Hologram ARon October 12, 2020 at 4:12 am
HONG KONG, Oct. 12, 2020 (GLOBE NEWSWIRE) -- Sci-fi movies are full of futuristic presages, and as a fan, Ghost in the Shell is getting the ...
- The Effort to Build the Mathematical Library of the Futureon October 11, 2020 at 5:00 am
A community of mathematicians is using software called Lean to build a new digital repository. They hope it represents where their field is headed next.
- GPT-3: new AI can write like a human but don't mistake that for thinking – neuroscientiston October 9, 2020 at 10:23 pm
Elon Musk's OpenAI has developed software that can produce human-like writing. Don't mistake that for true intelligence.
- AI tool could predict how drugs will react in the bodyon October 7, 2020 at 10:55 am
Metabolic Translator, a new deep-learning tool, may offer a way to see how drugs in development will work in the body, not just the lab.
- Deep learning takes on synthetic biologyon October 7, 2020 at 2:00 am
Two teams of scientists from the Wyss Institute at Harvard University and the Massachusetts Institute of Technology have devised pathways around this roadblock by going beyond human brains; they ...
- Deep learning gives drug design a booston October 5, 2020 at 7:46 am
A computational tool created at Rice University may help pharmaceutical companies expand their ability to investigate the safety of drugs.
- New Facebook software creates smarter natural language processing modelson September 28, 2020 at 8:18 am
Facebook Inc. has designed a new artificial intelligence framework it says can create more intelligent natural language processing models that generate accurate answers to questions without being ...
- Now is the time for employers to educate themselves on how AI can benefit HRon September 25, 2020 at 7:33 am
Using computer algorithms, systems parse data, draw conclusions from it, and then make determinations or predictions about something in the world. Natural language processing is the process by which ...