Skip to main content

5.3.37. United States of America

FLaReNet Summary

The US pioneered the field of HLT, LT evaluation and of LR. Large activities regarding LT evaluation are conducted at NIST, while the LDC takes care of making available the Language Resources. Funding is provided by the Department of Defense (DARPA and IARPA), by NSF or by the Department of Education. Many universities and non-profit organizations are developing research investigations in this area. Most of the works address the (American) English language.

Contact Point Input

National/Regional contact: Chistopher Cieri, Linguistic Data Consortium, University of Pennsylvania.


Here are links to programs that are or were recently related to language resources and technologies in the US.

   * NIST - National institute of Standards and Technologies

NIST Machine Translation Evaluation for GALE (Global Autonomous Language Exploitation)

Automatic Content Extraction (ACE) Evaluation

CLEAR evaluation

Language Recognition Evaluation (LRE)
LVDID (Language Variations and Dialect Identification) is LDC's project to support language variety/dialect identification, especially in the NIST LRE campaigns.

NIST Open Machine Translation (OpenMT) Evaluation

NIST Metrics for Machine Translation Challenge

Rich Transcription Evaluation

Speaker Recognition Evaluation (SRE)
Mixer is LDC's project to support speaker identification campaigns (languages: Arabic, English, Mandarin, Russian, Spanish) and Mixer Greybeard was a specific LDC's project to support speaker identification R&D robust to aging, especially within the NIST SRE campaigns.

TRECVid 2009 Evaluation for Event Detection
HAVIC is LDC's project to support web video collection for NIST TRECVid campaigns.

NIST MADCAT Evaluation
MADCAT = Multilingual Automatic Document Classification Analysis and Translation

AVSS 2009 Multi-Camera Tracking Challenge

Spoken Term Detection

Broadcast News Recognition Evaluation

Conversational Telephone Recognition Evaluation

Spoken Document Retrieval Evaluation

Topic Detection and Tracking Evaluation

   * NSF - National Science Foundation

Computer & Information Science & Engineering
NSF/CISE/IIS/RI (Robust Intelligence Program):

Computer Research Infrastructure
Production of the MASC manually annotated corpus.


Social, Behavioral & Economic Sciences

   * DARPA

Global Autonomous Language Exploitation (GALE)
GALE aims at the multilingual transcription, translation into English and distillation of text into structured information. It includes text (news, newsgroup, blog), transcribed speech (broadcast news and conversation) translated and aligned at sentence and sub-sentence level, annotations for syntactic structure & propositional content, distillation into structured information, for English, Mandarin and Arabic.

Machine Reading (MR)

Multilingual Automatic Document Classification Analysis and Translation (MADCAT)
MADCAT supports systems that perform OCR and MT of handwritten, printed and hybrid text, with varying scribe, text type, writing instrument, time, speed of writing, paper quality. First language is Arabic.

Spoken Language Communication and Translation System for Tactical Use (TRANSTAC)
TRANSTAC aims at STS translation, in a limited domain, in a portable platform for Arabic and Persian.

Robust Automatic Transcription of Speech (RATS)
RATS concerns algorithmic development and Signal Processing: Speech Activity Detection, Language Identification, Speaker Identification and Key Word Spotting. It includes Data Collection and Evaluation.

   * IARPA

IARPA, the “Intelligence Advanced Research Projects Activity” invests in high-risk/high-payoff research. Their activity includes:
   • Smart Collection
         - BEST (Biometrics Exploitation Science & Technology). Multiple biometrics: face, ocular, voice, with challenging collection conditions.
   • Incisive Analysis
      - ALADDIN (Automated Low-Level Analysis and Description of Diverse Intelligence Video)
      - SCIL (Socio-cultural Content in Language)

   * Department of Education

International Education Programs Service
IRS: Reading assistance and assessment tools for morphologically complex languages; Digital dictionaries of Arabic colloquial varieties; Survey of DOE funded dictionary projects.

   * JHU Center for Language and Speech Processing Summer Workshops