Fact Sheets from NIST [skip navigation] Contact NISTgo to A-Z subject indexgo to NIST home pageSearch NIST web spaceNIST logo. go to NIST Home page

ATP Project Brief


2004 General Competition (September 2004)

Syntax- and Rule-Based Decoding for Statistical Machine Translation Systems

Other Information/Computers/Entertainment


Develop an integrated, statistical phrase-based and syntactic rule approach to machine translation to improve grammaticality and accuracy of translated materials, enabling broader use of machine-based translation by national security, government and business organizations.

Sponsor: Language Weaver, Inc.

4640 Admiralty Way
Suite 423
Marina Del Ray, CA 90292-6617

 

  • Project duration: 10/1/2004 - 9/30/2007
  • Total project (est.): $3,344,318
  • Requested ATP funds: $1,972,557

 

High-quality machine translation of natural languages has been a dream of researchers for over 50 years, but has yet to reach a level of sophistication and reliability sufficient for widespread application. The dominant approach for the past 30 years has been to use handcrafted linguistic rules, but this approach is very expensive to build, requiring the manual entry of large numbers of "rules" by trained linguists. This approach does not scale up well to a general system. Such systems also produce translations that are awkward and hard to understand. In recent years, a newer approach based on statistical models - a word or phrase is translated to one of a number of possibilities based on the probability that it would occur in the current context - has achieved marked success. The best examples substantially outperform rule-based systems. Statistics-based machine translation (SMT) also may prove easier and less expensive to expand, if the system can be taught new knowledge domains or languages by giving it large samples of existing human-translated texts. Despite some success, however, severe problems still exist: outputs are often ungrammatical and the quality and accuracy of translation falls well below that of a human linguist - and well below demands of all but highly specialized commercial markets. Language Weaver, a leader in SMT research, proposes to overcome these limitations with a hybrid system that would still be fundamentally statistics-based, but would incorporate higher level abstract syntax rules to arrive at the final translation. Such hybrids have been explored in the research community, but without any real success because it is difficult to merge the fundamentally different approaches. Language Weaver proposes a more complex and tightly integrated approach. The company will develop new algorithms that exploit knowledge of how words, phrases and patterns should be translated; knowledge of how syntax-based and non-syntax based translation rules should be applied; and knowledge of how syntactically based target structures should be generated. Cross-lingual parsers of increasing complexity will be developed, as well as methods to choose different syntactic orderings in different situations. Language Weaver is a small company attempting to expand its existing business in translation software and does not have the resources to pursue this far-reaching research track without ATP support. If the company is successful where others have failed, it will open up a significantly larger share of the $10 billion translation business to high-quality machine translation. Beyond that, there are far reaching social and economic benefits: quicker translation of intelligence information will aid the war on terrorism; U.S. businesses will be able to boost export sales by translating more sales and product literature; governments worldwide will be better able to provide support and services to non-native language populations; and the translation costs of doing international business will be lowered.

 

For project information:
Beth Walsh, (858) 724-2500
Beth@clearpointagency.com

ATP Project Manager
Omid Omidvar, 301-975-4401
omid.omidvar@nist.gov

 

This is the fact sheet for this project as it was announced on September 28, 2004.
Click here for the latest version of this fact sheet.
Visit the Advanced Technology Program Home Page

Date created: 9/28/2004
Last updated: 9/28/2004
Contact: inquiries@nist.gov