|
NSF PR 97-14 - February 18, 1997
This material is available primarily for archival purposes. Telephone
numbers or other contact information may be out of date; please see current
contact information at media
contacts.
Can Computers Communicate Like People Do?
Imagine two people at a table in a restaurant. Are
they intimately leaning toward each other or sitting
stiffly? Are they gazing dreamily at each other or
avoiding eye contact? How are they gesturing? What
are they saying and with what tone of voice? A mere
glance and a snippet of conversation make it easy
for a person to quite accurately guess the situation:
is it lovers, friends having an argument, or a business
meeting?
Humans have an ability that exceeds computers to process
many different types of information -- images, words
and intonation, posture and gestures, and written
language -- and from these draw a conclusion. More
"natural" interactions with "smarter" computers will
make them accessible to a broader range of people
(including people with disabilities) in a wider range
of settings while being more useful in helping people
sort through and synthesize the glut of available
information.
A set of 15 awards in a new $10 million program led
by the National Science Foundation -- Speech, Text,
Image and Multimedia Advanced Technology Effort (STIMULATE)
-- will fund university researchers investigating
human communication and seeking to improve our interaction
with computers. Four agencies, NSF, National Security
Agency Office of Research and SIGINT Technology, the
Central Intelligence Agency Office of Research and
Development, and the Defense Advanced Research Projects
Agency Information Technology Office are participating.
"This program goes well beyond the era of graphical
interfaces with our computers," said Gary Strong,
NSF program manager. "Perhaps some day we can interact
with our computers like we interact with each other,
even having `intelligent' computer assistants. STIMULATE
has the potential for enormous impact on anyone who
must process large amounts of data as well as for
people with disabilities, the illiterate and others
who might not be able to use a computer keyboard."
Funded projects include: a filter for TV, radio and
newspaper accounts that will quickly provide a user
with a synopsis; a computerized translation program;
and a "humanoid" computer that will understand human
communication including facial expressions, gestures
and speech intonation. Others projects include speech
recognition, understanding handwriting, and indexing
and retrieving video.
Attachment: List of STIMULATE
awardees
Attachment
STIMULATE Awards
Contact: Beth Gaston, NSF
(703) 305-1070
Midge Holmes, CIA
(703) 482-6686
Judith Emmel, NSA Public Affairs
(301) 688-6524
- Alfred Aho, Shih Fu Chang and Kathleen McKeown
Columbia University
(212) 939-7004, aho@cs.columbia.edu
An Environment for Illustrated Briefing and
Follow-up Search Over Live Multimedia Information
Researchers seek to provide up-to-the-minute briefings
on topics of interest, linking the user into a
collection of related multimedia documents. On
the basis of a user profile or query, the system
will sort multimedia information to match the
user's interests, retrieving video, images and
text. The system will automatically generate a
briefing on information extracted from the documents
and determined to be of interest to the user.
- James Allan and Allen Hanson
University of Massachusetts, Amherst
(413) 545-3240, allan@cs.umass.edu
Multi-Modal Indexing, Retrieval, and Browsing:
Combining Content-Based Image Retrieval with Text
Retrieval.
In the rapidly emerging area of multimedia information
systems, effective indexing and retrieval techniques
are critically important. In this project, the
Center for Intelligent Information Retrieval will
develop a system to index and retrieve collections
including combinations of images, video and text.
- Jaime Carbonell
Carnegie Mellon University
(412) 268-3064, jgc@cs.cmu.edu
Generalized Example-based Machine Translation
With example-based machine translation, computers
search pre-translated texts for the closest match
to each new sentence being translated. The goal
of this project is to develop generalizations
that will increase the accuracy of translations
and reduce the size of the necessary data base.
- Justine Cassell
MIT
(617) 253-4899, justine@media.mit.edu
A Unified Framework for Multimodal Conversational
Behaviors in Interactive Humanoid Agents
Humans communicate using speech with intonation
and modulation, gestures, gaze and facial expression.
Researchers will study how humans interact and
develop a humanoid computer that can produce human-like
communicative behaviors and comprehend complex
communication on the part of humans.
- Charles Fillmore
International Computer Science Institute, UC Berkeley
fillmore@ICSI.Berkeley.edu
Tools for Lexicon Building
This project contains two parts: computational
tools for language research and a thesaurus-like
database of English words with definitions, how
each word relates to other similar words and the
range of each word's use. The tools and the database
will be useful for researchers studying language
processing and speech recognition.
- James Flanagan, Casimir Kulikowski; Joseph Wilder
Grigore Burdea and Ivan Marsic
Rutgers University
(908) 445-3443, jlf@caip.rutgers.edu
Synergistic Multimodal Communication in Collaborative
Multiuser Environments
Digital networking and distributed computing open
opportunities for collaborative work by geographically-separated
participants. But participants must communicate
with one another, and with the machines they are
using. The sensory dimensions of sight, sound
and touch, used in combination, are natural modes
for the human. This research establishes computer
interfaces that simultaneously use the modalities
of sight, sound and touch for human-machine communication.
Emerging technologies for image processing, automatic
speech recognition, and force- feedback tactile
gloves support these multimodal interfaces.
- James Glass, Stephanie Seneff and Victor Zue
MIT
(617) 253-1640, glass@mit.edu
A Hierarchical Framework for Speech Recognition
and Understanding
Most current speech recognizers use very simple
representations of words and sentences. In this
project, researchers aim to incorporate additional
sources of linguistic information such as the
syllable, phrase and intonation, into a system
which can be used for understanding conversational
speech. They plan to develop a model that can
be applied to many languages.
- Barbara Grosz and Stuart Shieber
Harvard University
(617) 495-3673, grosz@eecs.harvard.edu
Human-Computer Communication and Collaboration
This project will develop methods for designing
and building software that operates in collaboration
with a human user, rather than as a passive servant.
The aim is to apply theories of how people collaborate
to the problem of the design of software, keeping
in mind the differing capabilities of the human
and computer collaborators.
- Jerry Hobbs and Andrew Kehler
SRI International
(415) 859-2229, hobbs@ai.sri.com
Multimodal Access to Spatial Data
This project will focus on enabling computers
to understand what people are referring to as
they use language and gesture while interacting
with computer systems that provide access to geographical
information. The results will enhance the capabilities
and ease of use of future interactive systems,
such as systems for travel planning and crisis
management.
- Fred Jelinek, Eric Brill, Sanjeev Khudanpur and
David Yarowsky
Johns Hopkins University
(410) 516-7730, jelinek@jhu.edu
Exploiting Nonlocal and Syntactic Word Relationships
in Language Models for Conversational Speech Recognition
Interacting with computers by speech or handwriting
will make computers more accessible to people
with disabilities and will allow users to carry
on other tasks, like querying an on-line maintenance
manual while performing mechanical repairs. To
recognize speech or handwriting, most mechanical
systems look only at nearby words to identify
unknowns, while people doing the same tasks use
the entire context. This project will focus on
improving the recognition accuracy for spoken
and handwritten language and will provide techniques
applicable to all types of language modeling.
- Kathleen McKeown and Judith Klavans
Columbia University
(212) 939-7118, kathy@cs.columbia.edu
Generating Coherent Summaries of On-Line Documents:
Combining Statistical and Symbolic Techniques
This project will allow computers to analyze the
text from a set of related documents across many
subject areas and summarize the documents. Within
the summary, similarities and differences between
documents will be highlighted, indicating what
each document is about. The research will be part
of a digital library project emphasizing aids
for reducing information overload.
- Mari Ostendorf
Boston University
(617) 353-5430, mo@raven.bu.edu
Modeling Structure in Speech above the Segment
for Spontaneous Speech Recognition
Current speech recognition technology leads to
unacceptably high error rates of 30-50 percent
on natural conversational or broadcast speech,
in large part because current models were developed
on read speech and do not account for variability
in speaking style. This project aims to improve
recognition performance by representing structure
in speech at the level of the syllable, the phrase
and with a different speaker.
- Francis Quek and Rashid Ansari
University of Illinois at Chicago
(312) 996-5494, quek@eecs.uic.edu
Gesture, Speech and Gaze in Discourse Management
This project involves experiments to discover
and quantify the cues to human communication,
including the role of gestures, speech intonation
and gaze, and then develop computer programs capable
of recognizing such cues in videos.
- Elizabeth Shriberg and Andreas Stolcke
SRI International
(415) 859-3798, ees@speech.sri.com
Modeling and Automatic Labeling of Hidden Word-Level
Events in Speech
Most computer systems that process natural language
require input that resembles written text, such
as one would read in a newspaper. Spoken discourse,
however, differs from text in ways that present
challenges to computers. One challenge is that
speech does not contain explicit punctuation such
as periods to separate sentences. Another challenge
is that when people speak naturally, they say
things like "um" or "uh," "you-know" and other
word-level events which interrupt the formal structure
of sentences. This project will use word patterns
as well as the timing and melody of speech to
identify sentence boundaries and nongrammatical
events to help computers better understand natural
speech.
- Yao Wang and Edward Wong
Brooklyn Polytechnic University
(718) 260-3469, yao@vision.poly.edu
Video Scene Segmentation and Classification
Using Motion and Audio Information
A video sequence includes lots of different types
of information, including speech, text, audio,
color patterns and shapes in individual frames,
movement of objects as shown by changes between
frames. Humans can quickly interpret information;
computer understanding of video is still primitive.
The aim of this project is to develop new theory
and techniques for scene segmentation and classification
in a video sequence, which will have direct applications
in information indexing and retrieval in multimedia
databases, spotting and tracking of special events
in surveillance video, and video editing.
|
|