The following papers, presented or published by LDC staff, are
listed by year and then alphabetically by the last name of the first
author.
2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | Undated
Mona Diab, Aous Mansouri, Martha Palmer, Olga Babko-Malaya,Wajdi Zaghouani, Ann Bies, Mohammed Maamouri
A Pilot Arabic Propbank;
LREC 2008, Marrakech, Morocco, May 28-30, 2008
Available: Paper in PDF
Ryan Gabbard and Seth Kulick
Construct State Modification in the Arabic Treebank;
ACL 2008, Columbus, Ohio, June 16-18, 2008
Available: Paper in PDF
Mohamed Maamouri, Seth Kulick, Ann Bies
Diacritic Annotation in the Arabic Treebank and Its Impact on Parser
Evaluation;
LREC 2008, Marrakech, Morocco, May 28-30, 2008
Available: Paper in PDF,
Poster
Mohamed Maamouri, Ann Bies, Seth Kulick
Enhanced Annotation and Parsing of the Arabic Treebank;
INFOS 2008, Cairo, Egypt, March 27-29, 2008
Available: Paper in PDF
Mohamed Maamouri, Ann Bies, Seth Kulick
Enhancing the Arabic Treebank: A Collaborative Effort toward New Annotation Guidelines;
LREC 2008, Marrakech, Morocco, May 28-30, 2008
Available: Paper in PDF,
Poster
Marian Reed, Denise DiPersio and Christopher Cieri
The Linguistic Data Consortium Member Survey: Purpose, Execution and Results;
LREC 2008, Marrakech, Morocco, May 28-30, 2008
Available: Paper in PDF,
Presentation Slides
Christopher Cieri
Phonological Variation in Multi-Dialectal Italy: distinguishing e from ε
NWAV 2007, Philadelphia, October 11-14, 2007
Available: Presentation Slides
Christopher Cieri, Stephanie Strassel, Meghan Lammie Glenn, Lauren Friedman
Linguistic Resources in Support of Various Evaluation Metrics
MT Summit XI, Workshop on Automatic Procedures in MT Evaluation, Copenhagen, September 9-14,2007
Available: Presentation Slides
Christopher Cieri, Linda Corson, David Graff, Kevin Walker
Resources for New Research Directions in Speaker Recognition: The Mixer 3, 4 and 5 Corpora
Interspeech 2007, Antwerp, August 2007.
Available: Paper in PDF,
Presentation Slides
K. Ganchev, K. Crammer, F. Pereira, G. Mann, K. Bellare, A. McCallum, S. Carroll, Y. Jin, P. White.
Penn/UMass/CHOP Biocreative II Systems
Biocreative 2. [In Press]
Available: Paper in PDF
Kuzman Ganchev, Fernando Pereira, Mark Mandel, Steven Carroll, Peter White
Semi-automated Named Entity Annotation
Linguistic Annotation Workshop 2007 [In Press]
Available: PDF
Olga Babko-Malaya, Ann Bies, Ann Taylor, Szuting Yi, Martha Palmer,
Mitch Marcus, Seth Kulick, Libin Shen
Issues in Synchronizing the English Treebank and PropBank
Frontiers in Linguistically Annotated Corpora, A Merged Workshop with
7th International Workshop on Linguistically Interpreted Corpora (LINC-2006)
and Frontiers in Corpus Annotation III, Coling/ACL 2006
Available: Paper in PDF
Ann Bies, Stephanie Strassel, Haejoong Lee, Kazuaki Maeda, Seth Kulick, Yang Liu, Mary
Harper, Matthew Lease
Linguistic Resources for Speech Parsing
LREC 2006: Fifth International Conference on Language Resources and Evaluation
Available: Paper in PDF
Christopher Cieri
Linguistic Resources, Development and Evaluation
Chapter 8 in Laila Dybkjær, Holmer, Hemsen and Wolfgang Minker,
Evaluation of Text and Speech Systems, Kluwer Academic Publishers
Available: Forthcoming
Christopher Cieri, Mark Liberman, Victoria Arranz and Khalid Choukri
Linguistic Data Resources
Chapter 3 in Tanja Schultz and Katrin Kirchhoff (eds.) Multilingual Speech Processing, Elsevier, Academic Press, ISBN 13: 978-0-12-088501-5. April 2006.
Available:
Elsevier's Page
Christopher Cieri
What is Quality? Invited Talk at the Workshop on Quality Assurance and Quality Measurement for Language and Speech Resources
LREC 2006: Fifth International Conference on Language Resources and Evaluation
Available:
Presentation Slides
Christopher Cieri, Mark Liberman
More Data and Tools for More Languages and Research Areas: A Progress Report on LDC Activities
LREC 2006: Fifth International Conference on Language Resources and Evaluation
Available:
Paper in PDF,
Presentation Slides
Christopher Cieri, Walt Andrews, Joseph P. Campbell, George Doddington,
Jack Godfrey, Shudong Huang, Mark Liberman, Alvin Martin, Hirotaka Nakasone, Mark Przybocki, Kevin Walker
The Mixer and Transcript Reading
Corpora: Resources for Multilingual, Crosschannel Speaker Recognition Research
LREC 2006: Fifth International Conference on Language Resources and Evaluation
Available:
Paper
in PDF,
Presentation
Slides
Ryan Gabbard, Seth Kulick, Mitchell Marcus
Fully Parsing the Penn Treebank
HLT-NAACL, 2006
Available: Paper in PDF
David Graff, Tim Buckwalter, Hubert Jin, Mohamed Maamouri
Lexicon Development for Varieties of Spoken Colloquial Arabic
LREC 2006: Fifth International Conference on Language Resources and
Evaluation
Available: Paper in PDF
Yang Jin, Ryan McDonald, Kevin Lerman, Mark Mandel, Steven Carroll, Mark Y Liberman, Fernando Pereira, Raymond Winters, Peter White
Automated recognition of malignancy mentions in biomedical literature
Open Access: BMC Bioinformatics 7:492
Available: Paper in PDF
Xiaoyi Ma
Champollion: A Robust Parallel Text Sentence Aligner
LREC 2006: Fifth International Conference on Language Resources and
Evaluation
Available: Paper in PDF
Xiaoyi Ma, Christopher Cieri
Corpus Support for Machine Translation at LDC
LREC 2006: Fifth International Conference on Language Resources and
Evaluation
Available: Paper in PDF
Maamouri, Mohamed; Ann Bies and Seth Kulick
Diacritization: A Challenge to Arabic Treebank Annotation and Parsing
Machine Translation SIG of the British Computer Society Conference
Available: Paper in PDF
Mohamed Maamouri, Ann Bies, Tim Buckwalter, Mona Diab, Nizar Habash, Owen Rambow, Dalila Tabessi
Developing and Using a Pilot Dialectal Arabic Treebank
LREC 2006: Fifth International Conference on Language Resources and Evaluation
Available: Paper in PDF
Kazuaki Maeda, Christopher Cieri, Kevin Walker
Low-cost Customized Speech Corpus Creation for Speech Technology Applications
LREC 2006: Fifth International Conference on Language Resources and
Evaluation
Kazuaki Maeda, Haejoong Lee, Julie Medero, Stephanie Strassel
A New Phase in Annotation Tool Development at the Linguistic Data Consortium: The Evolution of the Annotation Graph Toolkit
LREC 2006: Fifth International Conference on Language Resources and
Evaluation
Mark Mandel
Integrated Annotation of Biomedical Text: Creating the PennBioIE Corpus
Presented at Text Mining, Ontologies and Natural Language Processing in Biomedicine, Manchester, UK, March 20 - 21, 2006
Available: Abstract,
Presentation Slides in PDF
Ryan McDonald, Kevin Lerman, and Fernando Pereira
Multilingual Dependency Parsing with a Two-Stage Discriminative Parser
Computational Natural Language Learning (CoNLL-X), 2006
Available: Paper as PDF
Julie Medero, Kazuaki Maeda, Stephanie Strassel, Christopher Walker
An Efficient Approach for Gold-Standard Annotation: Decision Points for Complex Tasks
LREC 2006: Fifth International Conference on Language Resources and
Evaluation
Available: Paper in PDF
Stephanie Strassel, Andrew W. Cole
Corpus Development and Publication
LREC 2006: Fifth International Conference on Language Resources and Evaluation
Available: Paper in PDF and
Poster in PPT
Stephanie Strassel, Christopher Cieri, Andy Cole, Denise DiPersio, Mark Liberman, Xiaoyi Ma, Mohamed Maamouri, Kazuaki Maeda
Integrated Linguistic Resources for Language Exploitation Technologies
LREC 2006: Fifth International Conference on Language Resources and Evaluation
Available:
Paper in PDF,
Presentation Slides
Jiahong Yuan, Mark Liberman, Christopher Cieri
Towards an Integrated Understanding of Speaking Rate in Conversation
The Ninth International Conference on Spoken Language Processing
(Interspeech 2006 - ICSLP), Pittsburgh, Pennsylvania
Available:
Paper in PDF,
Presentation Slides
Ann Bies, Seth Kulick, Mark Mandel
Parallel Entity and Treebank Annotation
Presented at
Frontiers in Corpus Annotation II: Pie in the Sky, ACL 2005 workshop, Ann Arbor, June 29, 2005
Available: Paper in PDF
Violetta Cavalli-Sforza, Mohamed Maamouri
Extensions to Histogram-Based Student Modeling Approach to Facilitate Reading in Morphologically Complex Languages
AIED: International Conference on Artificial Intelligence in Education
Available: Paper in PDF
Christopher Cieri
HLT Evaluation: The Role of Data Centers
ELRA HLT Evaluation Workshop, Malta, December 2005
Available:
Presentation Slides
Christopher Cieri
Modeling Phonological Variation in Multidialectal Italy
University of Pennsylvania, Doctoral Dissertation, May 2005
Available: PDF from ProQuest
Yang Jin, Ryan T. McDonald, Kevin Lerman, Mark A. Mandel, Mark Y. Liberman, Fernando Pereira, R. Scott Winters, Peter S. White
Identifying and Extracting Malignancy Types in Cancer Literature
Presented at BioLink 2005: ISMB/ACL, Detroit, June 24, 2005
Available: Paper in PDF
Mohamed Maamouri
Arabic Literacy
Lemma, 11,16 in Encyclopedia of Arabic Language and Linguistics (EALL). Vol 2
Available: Paper in PDF
Ryan McDonald, Fernando Pereira, Seth Kulick, Scott Winters, Yang Jin, and Peter White
Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE
43rd Annual Meeting of the Association for Computational Linguistics, 2005
Available: Paper in PDF
Tim Buckwalter (2004)
Issues in Arabic Orthography and Morphology Analysis
Proceedings of the Workshop on Computational Approaches to Arabic
Script-based Languages, COLING 2004, Geneva, August 28, 2004.
Available: Paper in PDF
Christopher Cieri, Joseph P. Campbell, Hirotaka Nakasone, David Miller, Kevin Walker
The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
LREC 2004: Fourth International Conference on Language Resources and Evaluation, Lisbon
Available:
Paper in PDF,
Poster in PowerPoint Format
Christopher Cieri, Mark Liberman
Progress Report from the Linguistic Data Consortium: recent activities in resource creation and distribution and the development of tools and standards
LREC 2004: Fourth International Conference on Language Resources and Evaluation, Lisbon
Available:
Paper in PDF,
Presentation Slides
Christopher Cieri, David Miller, Kevin Walker
The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text
LREC 2004: Fourth International Conference on Language Resources and Evaluation, Lisbon
Available:
Paper in PDF,
Presentation Slides
George Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw,
Stephanie Strassel, Ralph Weischedel
Automatic Content Extraction (ACE)
program - task definitions and performance measures
LREC 2004: Fourth International Conference on Language Resources and
Evaluation
Available: Paper in PDF
Shudong Huang, Stephanie Strassel, Alexis Mitchell, Zhiyi Song
Shared Resources for Multilingual
Information Extraction and Challenges in Named Entity Annotation
IJCNLP-04 Workshop on Named Entity
Recognition for NLP Applications, Hainan Island, China, March 2004
Available: Paper in PDF
Seth Kulick, Ann Bies, Mark Liberman, Mark Mandel, Ryan McDonald, Martha Palmer, Andrew Schein, Lyle Ungar, Scott Winters, Pete White
Integrated Annotation for Biomedical Information Extraction
Presented at HLT/NAACL Workshop BioLink 2004, Boston, May 2-7, 2004
Available: Paper in PDF, Presentation Slides
Mohamed Maamouri and Ann Bies (2004)
Developing an Arabic Treebank: Methods, Guidelines, Procedures, and
Tools
Proceedings of the Workshop on Computational Approaches to Arabic
Script-based Languages, COLING 2004, Geneva, August 28, 2004.
Available: Paper in PDF
Mohamed Maamouri, Tim Buckwalter, and Christopher Cieri (2004)
Dialectal Arabic Telephone Speech Corpus: Principles, Tool Design,
and Transcription Conventions
Paper presented at the NEMLAR International Conference on Arabic
Language Resources and Tools, Cairo, Sept. 22-23, 2004.
Available: Paper in PDF, Presentation
Slides.
Mohamed Maamouri, Ann Bies, Tim Buckwalter, and Wigdan Mekki (2004)
The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic
Corpus
Paper presented at the NEMLAR International Conference on Arabic
Language Resources and Tools, Cairo, Sept. 22-23, 2004.
Available: Paper in PDF
Mohamed Maamouri, David Graff, Hubert Jin, Christopher Cieri, and
Tim Buckwalter (2004)
Dialectal Arabic Orthography-based Transcription and CTS Levantine
Arabic Collection.
Paper presented at the Parallel STT-NA Tracks Session of the EARS
RT-04 Workshop, Palisades IBM Executive Center, New York, Nov. 10,
2004.
Available: Paper in Word format
Ryan McDonald, R. Scott Winters, Mark Mandel, Yang Jin, Peter S. White, Fernando Pereira
An entity tagger for recognizing acquired genomic variations in cancer literature
Bioinformatics 20:3249-3251
Available: Paper in PDF
Kazuaki Maeda and Stephanie Strassel (2004)
Annotation Tools for Large-Scale
Corpus Development: Using AGTK at the Linguistic Data Consortium.
LREC 2004: Fourth International Conference on Language Resources and
Evaluation
Available: Paper in PDF
Mike Maxwell
From Legacy Lexicon to Archivable Resource
First Steps for Language Documentation of Minority Languages: Workshop on Computational Linguistic Tools for Morphology, Lexicon and Corpus Compilation,
LREC 2004
Available: Paper in PDF
Douglas Oard, Dagobert Soergel, G. Craig Murray, David Doermann,
Jianqiang Wang, Bhuvana Ramabhadran, Martin Franz, James Mayfield and
Samuel Gustman, Stephanie Strassel
Building an Information Retrieval
Test Collection for Spontaneous Conversational Speech
27th Annual International ACM SIGIR
Conference (SIGIR2004), Sheffield, England, July 2004
Available: Paper in PDF
Stephanie Strassel
Linguistic Resources for Effective,
Affordable, Reusable Speech-to-Text
LREC 2004: Fourth International Conference on Language Resources and
Evaluation
Available: Paper in PDF
Colin Warner, Ann Bies, Christine Brisson, Justin Mott
Addendum to the Penn Treebank II Style Bracketing Guidelines:
BioMedical Treebank Annotation
November, 2004
Available: Paper in PDF ,
Paper as web page ,
Paper in plain text
Steven Bird and Gary Simons (2003)
Seven Dimensions of Portability for Language Documentation and
Description
Language 79, 557-582.
Available: Paper in PDF
Steven Bird and Gary Simons (2003)
Extending Dublin Core Metadata to support the description and
discovery of language resources
Computing and the Humanities 37, 375-388.
Available: Paper in PDF
Christopher Cieri, Stephanie Strassel
Robust Sociolinguistic Methodology: Tools, Data and Best Practices
NWAV 32, Philadelphia, 2003
Available:
Presentation Slides
Christopher Cieri, Mike Maxwell, Stephanie Strassel
Core Linguistic Resources for the World's Languages
ELSNET, ENABLER, ICWLR Joint Workshop, Paris, 2003
Available:
Presentation Slides
Baden Hughes and Steven Bird (2003)
Grid-Enabling Natural Language Engineering By Stealth
Proceedings of the Workshop on The Software Engineering and
Architecture of Language Technology Systems (SEALTS)
Available: arXiv.org
Seth Kulick, Mark Liberman, Martha Palmer, and Andrew Schein
Shallow Semantic Annotation of Biomedical Corpora for Information Extraction
ISMB
Special Interest Group Meeting on Text Mining (BioLink). June 2003.
Brisbane, Australia
Available: Paper in PDF ,
Presentation slides
Mike Maxwell
Incremental Grammar Development using Finite State Tools
Proceedings of the Workshop on Finite-State Methods in Natural Language Processing,
EACL 10, Budapest, 13-14 April 2003.
Available: Paper in PDF
Gary Simons and Steven Bird (2003)
The Open Language Archives Community:
An infrastructure for distributed archiving of language resources
Literary and Linguistic Computing 18 (in press)
Available: arXiv.org
Gary Simons and Steven Bird (2003)
Building an Open Language Archives Community on the OAI Foundation
Library Hi Tech 21, 210-218,
Special Issue on Open Archives Initiative Metadata Harvesting.
Available: Paper in PDF
Stephanie Strassel, David Miller, Kevin Walker, Christopher Cieri
(2003)
Shared Resources for Robust
Speech-to-Text Technology
Eurospeech 2003
Available: Paper in PDF
Stephanie Strassel (2003)
Corpus Creation for Disfluency
Research
Disfluency in Spontaneous Speech
Conference, Gothenburg, Sweden
Available: Abstract
in PDF, Presentation
Slides in PDF
Stephanie Strassel, Alexis Mitchell, Shudong Huang (2003)
Multilingual Resources for Entity
Extraction
41st Annual Meeting of the Association
for Computational Linguistics Workshop on Multilingual and
Mixed-language Named Entity Recognition:
Combining Statistical and Symbolic Models, Sapporo Japan
Available: Paper in PDF
Stephanie Strassel, Mike Maxwell, Christopher Cieri (2003)
Linguistic Resource Creation for
Research and Technology Development: A Recent Experiment
Association for Computing Machinery
Transactions on Asian Language Information Processing (TALIP).
Volume 2, Issue 2, 101 - 117
Available: Paper in
PDF
Steven Bird, Kazuaki Maeda, Xiaoyi Ma, Haejoong Lee, Beth Randall,
and
Salim Zayat (2002)
TableTrans, MultiTrans, InterTrans and TreeTrans: Diverse Tools
Built on
the Annotation Graph Toolkit
Proceedings of the Third International Conference on Language
Resources and Evaluation
Available: arXiv.org
Christopher Cieri, Stephanie Strassel, David Graff, Nii Martey, Kara
Rennert
and Mark Liberman (2002)
Corpora for Topic Detection and Tracking
James Allan, ed. Topic Detection and Tracking: Event-based
Information
Organization, Kluwer International Series on Information Retrieval,
Bruce
Croft, series editor, Boston, Kluwer Academic Publishers.
Christopher Cieri, David Miller, Kevin Walker (2002)
Research Methodologies, Observations and Outcomes in
(Conversational) Speech Data Collection
HLT 2002 The Human Language Technologies Conference, San Diego,
CA, March 2002
Available: Notebook
Paper.
Christopher Cieri, Stephanie Strassel, William Labov
Sharable Resources for Sociolinguistic Research
NWAV31, Stanford, 2002
Available:
Presentation Slides
Scott Cotton and Steven Bird (2002)
An Integrated Framework for Treebanks and Multilayer Annotations
Proceedings of the Third International Conference on Language
Resources and Evaluation
Available: arXiv.org
Xiaoyi Ma, Haejoong Lee, Steven Bird and Kazuaki Maeda (2002)
Models and Tools for Collaborative Annotation
Proceedings of the Third International Conference on Language
Resources and Evaluation
Available: arXiv.org
Mohamed Maamouri, Christopher Cieri
Resources for Arabic Natural Language Processing
International Symposium on Processing Arabic, Tunis, April 2002
Available:
Presentation Slides
Kazuaki Maeda, Steven Bird, Xiaoyi Ma, and Haejoong Lee (2002)
Creating Annotation Tools with the Annotation Graph Toolkit
Proceedings of the Third International Conference on Language
Resources and Evaluation
Available: arXiv.org
Mike Maxwell, Gary Simons, and Larry Hayashi (2002)
A Morphological Glossing Assistant
Proceedings of the International LREC Workshop on Resources and
Tools in Field Linguistics
Available: Paper in PDF
Mike Maxwell (2002)
Resources for Morphology Learning and Evaluation
LREC 2002: Third International Conference on Language Resources and
Evaluation vol. III, 967-974
Available: Paper in PDF
Christopher Cieri, David Graff, David Miller, Kevin Walker (2001)
Resources and Infrastructure to Support Robust, Omnipresent ASR
Communicator, SPINE, ROAR Workshop, Orlando, November 2001
Available: Presentation
Slides.
Christopher Cieri, Andy Cole, Dave Graff, Nii Martey, Stephanie
Strassel, Cristina Tofan (2001)
SPINE 2001 Data Preparation and Annotation and the SPINE Corpora
Communicator, SPINE, ROAR Workshop, Orlando, November 2001
Available: Presentation
Slides.
Christopher Cieri and Steven Bird (2001)
Annotation Graphs, Annotation Servers and Multi-Modal Resources:
Infrastructure for Interdisciplinary Education, Research and Development
Proceedings of the Association for Computational Linguistics:
Workshop on Sharing Tools & Resources Toulouse, July 2001
Available: Paper in PDF, Presentation
Slides.
Lea Christiansen, Christopher Cieri, Kathleen Egan, Anita Kulman,
Milton Paul (2001)
Getting SMART about Authoring
CALICO 2001, University of Central Florida, Orlando, March 2001
Available: Presentation
Slides.
David Miller, Christopher Cieri and Kevin Walker (2001)
Switchboard Cellular Resources for Speaker Recognition
Speaker Recognition Workshop, Maritime Institute of Technology
and Graduate Studies, Linthicum MD, March 2001
Available: Presentation
Slides.
Stephanie Strassel, Christopher Cieri and Steven Bird (2001)
Shared Resources and Community Building for Corpus Linguistics and
Language Teaching
Corpus Linguistics and Language Teaching Workshop Boston, MA.,
March 2001
Available: Presentation
Slides.
Stephanie Strassel and Christopher Cieri
Data and Annotations for SocioLinguistics: A Corpus-Based Approach
to Sociolinguistic Research
Penn Linguistic Colloquium, Philadelphia, PA. March 2001
Available: Presentation
Slides.
Steven Bird and Mark Liberman (2001)
A formal framework for linguistic annotation
Speech Communication 33(1,2), pp 23-60.
Available: arXiv.org
Steven Bird, Gary Simons and Chu-Ren Huang (2001)
The Open Language Archives Community and Asian Language Resources
Proceedings of the Workshop on Language Resources in Asia, 6th
Natural Language Processing Pacific Rim Symposium (NLPRS),
Tokyo, November 2001.
Available: arXiv.org
Steven Bird and Gary Simons (2001)
The OLAC Metadata Set and Controlled Vocabularies
Proceedings of the ACL Workshop on Sharing Tools and Resources for
Research and Education, Toulouse, July 2001, pp 7-18.
Available: arXiv.org
Kazuaki Maeda, Steven Bird, Xiaoyi Ma and Haejoong Lee (2001)
The Annotation Graph Toolkit: Software Components for Building
Linguistic Annotation Tools
Proceedings of HLT 2001 The Human Language Technologies Conference,
San Diego, CA, March 2001
Available: Paper in PDF
Kazuaki Maeda and Steven Bird (2001)
A Framework for Annotating Animal Bioacoustic Data
The 142nd Meeting of the Acoustical Society of America,
Chicago, June 2001
Available: Presentation
Slides (Powerpoint).
Steve Cassidy and Steven Bird (2000)
Querying databases of annotated speech
Proceedings of the Eleventh Australasian Database Conference
Available: Paper in PDF
Christopher Cieri (2000)
Multiple Annotation of Reuseable Data Resources: Corpora for Topic
Detection and Tracking
In Rajman, M. and J. C. Chappelier, eds. (2000) Actes des 5es
Journees internationales d'analyse statistique des donnees textuelles,
volume 1, Ecole Polytechnique Federale de Lausanne
Available: Paper in PDF
Christopher Cieri (2000)
Issues and Tools for Annotating a Corpus of Sociolinguistic Field
Data
Linguistic Exploration Workshop in conjunction with
Linguistic Society of American Annual Meeting, Chicago, January 2000
Available: Presentation
Slides
Christopher Cieri, David Graff, Nii Martey, Stephanie Strassel (2000)
The TDT-3 Text and Speech Corpus
Presented at the Topic Detection and Tracking Workshop, Vienna,
Virginia, February 28 - March 1, 2000.
Available: Paper in PostScript
Christopher Cieri, Dave Graff, Mark Liberman, Nii Martey and
Stephanie Strassel (2000)
Large Multilingual Broadcast News Corpora for Cooperative Research
in Topic Detection and Tracking: The TDT2 and TDT3 Corpus Efforts
In Proceedings of the Second International Language Resources and
Evaluation Conference, Athens, Greece, May 2000.
Available: Paper in PDF
Christopher Cieri and Mark Liberman (2000)
Issues in Corpus Creation and Distribution: the Evolution of the
Linguistic Data Consortium
In Proceedings of the Second International Language Resources and
Evaluation Conference, Athens, Greece, May 2000.
Available: Paper in
PDF
David Graff and Steven Bird (2000)
Many uses, many annotations for large speech corpora: Switchboard
and TDT as case studies
2nd Language Resources and Evaluation Conference (LREC 2000)
Athens, Greece, May 2000
Available: Paper in PDF
--
Paper in
PostScript
Dave Graff, Stephanie Strassel and Christopher Cieri (2000)
Resources, New and Forthcoming, from LDC
Presented at the 2000 Speech Transcription Workshop, University
of Maryland, May 16-19, 2000.
Available: Presentation
Slides
Stephanie Strassel, Dave Graff, Nii Martey and Christopher Cieri
(2000)
Quality Control in Large Annotation Projects Involving Multiple
Judges: The Case of the TDT Corpora.
In Proceedings of the Second International Language Resources and
Evaluation Conference, Athens, Greece, May 2000.
Available: Paper in PDF
Steven Bird and Mark Liberman (1999)
A Formal Framework for Linguistic Annotation
Technical Report MS-CIS-99-01 - Department of Computer and
Information Science, University of Pennsylvania
(expanded from version presented at ICSLP-98, Sydney)
Available: Paper in PDF
Steven Bird and Mark Liberman (1999)
Annotation graphs as a framework for multidimensional linguistic
data analysis
Towards Standards and Tools for Discourse Tagging -- Proceedings of
the Workshop, Somerset, NJ: Association for Computational
Linguistics
Available: Paper in PDF
Steven Bird (1999)
Multidimensional exploration of online linguistic field data
Proceedings of the 29th Annual Meeting of the Northeast Linguistics
Society, University of Massachussetts at Amherst.
Available: Paper in PDF
Steven Bird and Stephanie Strassel (1999)
Annotated Corpora in Linguistic Research
North American Symposium on Corpora in Linguistics and Language
Teaching, University of Michigan, May 21, 1999.
Available: Presentation
Slides
Alexandra Canavan, Kevin Walker, David Graff and Christopher Cieri
(1999)
Telephone Speech Corpora: New Needs, Languages, Methods and
Technology
Presented at the Hub-5 Conversational Speech Understanding (LVCSR)
Workshop, Maritime Institute Technology and Graduate Studies,
Linthicum Heights, Maryland, June 1999.
Available: Presentation
Slides
Christopher Cieri, David Graff, Mark Liberman, Nii Martey, Stephanie
Strassel (1999)
The TDT-2 Text and Speech Corpus
Presented at the DARPA Broadcast News Workshop, Washington,
DC., February 1999.
Available: Paper in PDF
Christopher Cieri (1999)
This Ain't Your Father's Digital Data: Another Perspective on Legal
Information
Presented at the CALI 1999 - The Conference for Law School Computing.
Eugene Oregon, June 1999.
Available: Presentation
Slides,
Video in RealMedia
Xiaoyi Ma and Mark Liberman (1999)
BITS: A Method for Bilingual Text Search over the Web
Machine Translation Summit VII, September 13th, 1999, Kent Ridge
Digital Labs, National University of Singapore
Available: Paper in Postscript,
Paper in PDF
Xiaoyi Ma (1999)
Parallel Text Collections at the Linguistic Data Consortium
Machine Translation Summit VII, September 13th, 1999, Kent Ridge
Digital Labs, National University of Singapore
Available: Paper in
Postscript
Stephanie Strassel (1999)
Corpus Creation and Quality Control at the LDC
Presented at the Corpus of Spoken Dutch Workshop; Tilburg,
Netherlands; November 12, 1999.
Available: Presentation
Slides
Stephanie Strassel and Christopher Cieri (1999)
Corpus Sociolinguistics: Issues, Data and Tools
Presented at NWAVE-28, York University, Toronto, Ontario
October, 1999.
Available: Presentation
Slides
Steven Bird and Mark Liberman (1998)
Towards a Formal Framework for Linguistic Annotations
Proceedings of the 5th International Conference on Spoken Language
Processing.
Available: Paper in PDF
Christopher Cieri and David Graff (1998)
Topic Detection and Tracking Corpora
Presented at TREC/SDR Conference, Gaithesburg Maryland,
November 1998.
Available:
David Graff and Christopher Cieri (1998)
Update on Lexical Resources and Projects at the Linguistic Data
Consortium
Presented at the Ninth Hub-5 Conversational Speech Recognition
(LVCSR) Workshop, Maritime Institute Technology and Graduate
Studies, Linthicum Heights, Maryland, September 1998.
Available:
Mark Liberman and Christopher Cieri (1998)
The Creation, Distribution and Use of Linguistic Data
Proceedings of the First International Conference on Language
Resources and Evaluation, Granada, Spain, May 1998.
Available: Paper in PDF














