LDC Papers
The
following papers, presented or published by LDC staff, are listed by year and
then alphabetically by the last name of the first author.
2009 | 2008 |
2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | Undated
Steven Bird, Ewan Klein, and
Edward Loper
Natural Language Processing with Python ;
O'Reilly Media Inc, 2009
Available: Book in HTML
Christopher Cieri
Models of Phonological Variation for Multi-dialectal Communities: the case
of L'Aquila
presented at NWAV 38: New Ways of Analyzing Variation, University of
Ottawa, Ottawa, Canada, October 22-25, 2009
Available: Presentation
Slides
Christopher Cieri, Stephanie
Strassel
Closer Still to a Robust, All Digital, Empirical, Reproducible
Sociolinguistic Methodology
presented at NWAV 38: New Ways of Analyzing Variation, University of
Ottawa, Ottawa, Canada, October 22-25, 2009
Available: Presentation
Slides
Catherine Lai and Steven Bird
Querying Linguistic Trees ;
Journal of Logic, Language, and Information, Volume 18, 2009
Available: Paper in
PDF
Mohamed Maamouri, Ann Bies and
Seth Kulick
Creating a Methodology for Large-Scale Correction of Treebank Annotation:
The Case of the Arabic Treebank ;
MEDAR Second International Conference on Arabic Language Resources and
Tools, Cairo, Egypt, April 22-23, 2009
Available: Paper
in PDF, Presentation
Slides
Niklas Paulsson, Khalid Choukri,
Djamel Mostefa, Denise DiPersio, Meghan Glenn and Stephanie Strassel
A Large Arabic Broadcast News Speech Data Collection ;
MEDAR Second International Conference on Arabic Language Resources and
Tools, Cairo, Egypt, April 22-23, 2009
Available: Paper
in PDF, Poster
Linda Brandschain, Christopher
Cieri, David Graff, Abby Neely, Kevin Walker
Speaker Recognition: Building the Mixer 4 and 5 Corpora ;
LREC 2008, Marrakech, Morocco, May 28-30, 2008
Available: Paper
in PDF, Poster
Mona Diab, Aous Mansouri, Martha
Palmer, Olga Babko-Malaya,Wajdi Zaghouani, Ann Bies, Mohammed Maamouri
A Pilot Arabic Propbank;
LREC 2008, Marrakech, Morocco, May 28-30, 2008
Available: Paper in
PDF
Ryan Gabbard and Seth Kulick
Construct State Modification in the Arabic Treebank;
ACL 2008, Columbus, Ohio, June 16-18, 2008
Available: Paper
in PDF
Mohamed Maamouri, Seth Kulick,
Ann Bies
Diacritic Annotation in the Arabic Treebank and Its Impact on Parser
Evaluation;
LREC 2008, Marrakech, Morocco, May 28-30, 2008
Available: Paper
in PDF, Poster
Mohamed Maamouri, Ann Bies, Seth
Kulick
Enhanced Annotation and Parsing of the Arabic Treebank;
INFOS 2008, Cairo, Egypt, March 27-29, 2008
Available: Paper
in PDF
Mohamed Maamouri, Ann Bies, Seth
Kulick
Enhancing the Arabic Treebank: A Collaborative Effort toward New Annotation
Guidelines;
LREC 2008, Marrakech, Morocco, May 28-30, 2008
Available: Paper
in PDF, Poster
Kazuaki Maeda, Haejoong Lee,
Shawn Medero, Julie Medero, Robert Parker, Stephanie Strassel
Annotation Tool Development for Large-Scale Corpus Creation Projects at the
Linguistic Data Consortium
LREC 2008, Marrakech, Morocco, May 28-30, 2008
Available: Paper
in PDF
Kazuaki Maeda, Xiaoyi Ma,
Stephanie Strassel
Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of
Potential Parallel Text using
LREC 2008, Marrakech,
Morocco, May 28-30, 2008
Available: Paper
in PDF
Marian Reed, Denise DiPersio and
Christopher Cieri
The Linguistic Data Consortium Member Survey: Purpose, Execution and
Results;
LREC 2008, Marrakech, Morocco, May 28-30, 2008
Available: Paper
in PDF, Presentation
Slides
Gary Simons and Steven Bird
Toward a Global Infrastructure for the Sustainability of Language Resources;
22nd Pacific Asia Conference on Language, Information and Computation,
Cebu City, Philippines, 2008
Available: Paper
in PDF
Steven Bird and Haejoong Lee
Graphical Query for Linguistic Treebanks
Tenth Conference of the Pacific Association for Computational Linguistics,
Melbourne 2007
Available: Paper
in PDF
Christopher Cieri
Phonological Variation in Multi-Dialectal Italy: distinguishing e from ε
NWAV 2007, Philadelphia, October 11-14, 2007
Available: Presentation
Slides
Christopher Cieri, Stephanie
Strassel, Meghan Lammie Glenn, Lauren Friedman
Linguistic Resources in Support of Various Evaluation Metrics
MT Summit XI, Workshop on Automatic Procedures in MT Evaluation,
Copenhagen, September 9-14,2007
Available: Presentation
Slides
Christopher Cieri, Linda Corson,
David Graff, Kevin Walker
Resources for New Research Directions in Speaker Recognition: The Mixer 3, 4
and 5 Corpora
Interspeech 2007, Antwerp, August 2007.
Available: Paper
in PDF, Presentation
Slides
K. Ganchev, K. Crammer, F.
Pereira, G. Mann, K. Bellare, A. McCallum, S. Carroll, Y. Jin, P. White.
Penn/UMass/CHOP Biocreative II Systems
Biocreative 2. [In Press]
Available: Paper
in PDF
Kuzman Ganchev, Fernando Pereira,
Mark Mandel, Steven Carroll, Peter White
Semi-automated Named Entity Annotation
Linguistic Annotation Workshop 2007 [In Press]
Available: Paper in
PDF
Olga Babko-Malaya, Ann Bies, Ann
Taylor, Szuting Yi, Martha Palmer, Mitch Marcus, Seth Kulick, Libin Shen
Issues in Synchronizing the English Treebank and PropBank
Frontiers in Linguistically Annotated Corpora, A Merged Workshop with 7th
International Workshop on Linguistically Interpreted Corpora (LINC-2006) and
Frontiers in Corpus Annotation III, Coling/ACL 2006 Available: Paper
in PDF
Ann Bies, Stephanie Strassel,
Haejoong Lee, Kazuaki Maeda, Seth Kulick, Yang Liu, Mary Harper, Matthew Lease
Linguistic Resources for Speech Parsing
LREC 2006: Fifth International Conference on Language Resources and
Evaluation Available: Paper
in PDF
Steven
Bird, Yi Chen, Susan Davidson, Haejoong Lee, and Yifeng Zheng
Designing and Evaluating an XPath Dialect for Linguistic Queries
22nd International Conference on Data Engineering (ICDE), Atlanta Available:
Paper in
PDF
Christopher
Cieri
Linguistic Resources, Development and Evaluation
Chapter 8 in Laila Dybkjær, Holmer, Hemsen and Wolfgang Minker,
Evaluation of Text and Speech Systems, Kluwer Academic Publishers
Available: Forthcoming
Christopher Cieri, Mark Liberman,
Victoria Arranz and Khalid Choukri
Linguistic Data Resources
Chapter 3 in Tanja Schultz and Katrin Kirchhoff (eds.) Multilingual
Speech Processing, Elsevier, Academic Press, ISBN 13: 978-0-12-088501-5. April
2006.
Available: Elsevier's Page
Christopher Cieri
What is Quality? Invited Talk at the Workshop on Quality Assurance and
Quality Measurement for Language and Speech Resources
LREC 2006: Fifth International Conference on Language Resources and
Evaluation
Available: Presentation
Slides
Christopher Cieri, Mark Liberman
More Data and Tools for More Languages and Research Areas: A Progress Report
on LDC Activities
LREC 2006: Fifth International Conference on Language Resources and
Evaluation
Available: Paper
in PDF, Presentation
Slides
Christopher Cieri, Walt Andrews,
Joseph P. Campbell, George Doddington, Jack Godfrey, Shudong Huang, Mark
Liberman, Alvin Martin, Hirotaka Nakasone, Mark Przybocki, Kevin Walker
The Mixer and Transcript Reading Corpora: Resources for Multilingual,
Crosschannel Speaker Recognition Research
LREC 2006: Fifth International Conference on Language Resources and
Evaluation
Available: Paper
in PDF, Presentation
Slides
Ryan Gabbard, Seth Kulick,
Mitchell Marcus
Fully Parsing the Penn Treebank
HLT-NAACL, 2006
Available: Paper in
PDF
David Graff, Tim Buckwalter,
Hubert Jin, Mohamed Maamouri
Lexicon Development for Varieties of Spoken Colloquial Arabic
LREC 2006: Fifth International Conference on Language Resources and
Evaluation Available: Paper in
PDF
Yang
Jin, Ryan McDonald, Kevin Lerman, Mark Mandel, Steven Carroll, Mark Y Liberman,
Fernando Pereira, Raymond Winters, Peter White
Automated recognition of malignancy mentions in biomedical literature
Open Access: BMC Bioinformatics 7:492
Available: Paper
in PDF
Xiaoyi Ma
Champollion: A Robust Parallel Text Sentence Aligner
LREC 2006: Fifth International Conference on Language Resources and
Evaluation Available: Paper in PDF
Xiaoyi Ma, Christopher Cieri
Corpus Support for Machine Translation at LDC
LREC 2006: Fifth International Conference on Language Resources and
Evaluation Available: Paper in PDF
Maamouri, Mohamed; Ann Bies and
Seth Kulick
Diacritization: A Challenge to Arabic Treebank Annotation and Parsing
Machine Translation SIG of the British Computer Society Conference Available:
Paper
in PDF
Mohamed Maamouri, Ann Bies, Tim
Buckwalter, Mona Diab, Nizar Habash, Owen Rambow, Dalila Tabessi
Developing and Using a Pilot Dialectal Arabic Treebank
LREC 2006: Fifth International Conference on Language Resources and
Evaluation Available: Paper
in PDF
Kazuaki Maeda, Christopher Cieri,
Kevin Walker
Low-cost Customized Speech Corpus Creation for Speech Technology
Applications
LREC 2006: Fifth International Conference on Language Resources and
Evaluation
Kazuaki Maeda, Haejoong Lee,
Julie Medero, Stephanie Strassel
A New Phase in Annotation Tool Development at the Linguistic Data
Consortium: The Evolution of the Annotation Graph Toolkit
LREC 2006: Fifth International Conference on Language Resources and
Evaluation
Mark Mandel
Integrated Annotation of Biomedical Text: Creating the PennBioIE Corpus
Presented at Text Mining, Ontologies and Natural Language Processing in
Biomedicine, Manchester, UK, March 20 - 21, 2006
Available: Abstract,
Presentation
Slides in PDF
Ryan McDonald, Kevin Lerman, and
Fernando Pereira
Multilingual Dependency Parsing with a Two-Stage Discriminative Parser
Computational Natural Language Learning (CoNLL-X), 2006 Available: Paper
as PDF
Julie Medero, Kazuaki Maeda,
Stephanie Strassel, Christopher Walker
An Efficient Approach for Gold-Standard Annotation: Decision Points for
Complex Tasks
LREC 2006: Fifth International Conference on Language Resources and
Evaluation Available: Paper in PDF
Stephanie Strassel, Andrew W.
Cole
Corpus Development and Publication
LREC 2006: Fifth International Conference on Language Resources and
Evaluation Available: Paper
in PDF and Poster
in PPT
Stephanie
Strassel, Christopher Cieri, Andy Cole, Denise DiPersio, Mark Liberman, Xiaoyi
Ma, Mohamed Maamouri, Kazuaki Maeda
Integrated Linguistic Resources for Language Exploitation Technologies
LREC 2006: Fifth International Conference on Language Resources and
Evaluation
Available: Paper in
PDF, Presentation
Slides
Jiahong Yuan, Mark Liberman,
Christopher Cieri
Towards an Integrated Understanding of Speaking Rate in Conversation
The Ninth International Conference on Spoken Language Processing
(Interspeech 2006 - ICSLP), Pittsburgh, Pennsylvania
Available: Paper
in PDF, Presentation
Slides
Ann Bies, Seth Kulick, Mark
Mandel
Parallel Entity and Treebank Annotation
Presented at Frontiers in Corpus Annotation II: Pie in the Sky, ACL 2005
workshop, Ann Arbor, June 29, 2005
Available: Paper
in PDF
Jerry Goldman, Steve Renals,
Steven Bird, Franciska de Jong, Marcello Federico, Carl Fleischhauer, Mark
Kornbluh, Lori Lamel, Douglas Oard, Claire Stewart and Richard Wright
Transforming Access to the Spoken Word
International Journal on Digital Libraries 5, 287-298, 2005
Available: Paper
in PDF
Violetta Cavalli-Sforza, Mohamed
Maamouri
Extensions to Histogram-Based Student Modeling Approach to Facilitate
Reading in Morphologically Complex Languages
AIED: International Conference on Artificial Intelligence in Education Available:
Paper in
PDF
Christopher
Cieri
HLT Evaluation: The Role of Data Centers
ELRA HLT Evaluation Workshop, Malta, December 2005
Available: Presentation
Slides
Christopher Cieri
Modeling Phonological Variation in Multidialectal Italy
University of Pennsylvania, Doctoral Dissertation, May 2005
Available: PDF
from ProQuest
Yang Jin, Ryan T. McDonald, Kevin
Lerman, Mark A. Mandel, Mark Y. Liberman, Fernando Pereira, R. Scott Winters,
Peter S. White
Identifying and Extracting Malignancy Types in Cancer Literature
Presented at BioLink 2005: ISMB/ACL, Detroit, June 24, 2005
Available: Paper
in PDF
Mohamed Maamouri
Arabic Literacy
Lemma, 11,16 in Encyclopedia of Arabic Language and Linguistics (EALL).
Vol 2 Available: Paper in PDF
Ryan
McDonald, Fernando Pereira, Seth Kulick, Scott Winters, Yang Jin, and Peter
White
Simple Algorithms for Complex Relation Extraction with Applications to
Biomedical IE
43rd Annual Meeting of the Association for Computational Linguistics, 2005
Available: Paper
in PDF
Tim Buckwalter (2004)
Issues in Arabic Orthography and Morphology Analysis
Proceedings of the Workshop on Computational Approaches to Arabic
Script-based Languages, COLING 2004, Geneva, August 28, 2004.
Available: Paper
in PDF
Christopher Cieri, Joseph P.
Campbell, Hirotaka Nakasone, David Miller, Kevin Walker
The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
LREC 2004: Fourth International Conference on Language Resources and
Evaluation, Lisbon
Available: Paper in
PDF, Poster in
PowerPoint Format
Christopher Cieri, Mark Liberman
Progress Report from the Linguistic Data Consortium: recent activities in
resource creation and distribution and the development of tools and standards
LREC 2004: Fourth International Conference on Language Resources and
Evaluation, Lisbon
Available: Paper
in PDF, Presentation
Slides
Christopher Cieri, David Miller,
Kevin Walker
The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text
LREC 2004: Fourth International Conference on Language Resources and Evaluation,
Lisbon
Available: Paper in
PDF, Presentation
Slides
George Doddington, Alexis Mitchell,
Mark Przybocki, Lance Ramshaw, Stephanie Strassel, Ralph Weischedel
Automatic Content Extraction (ACE) program - task definitions and
performance measures
LREC 2004: Fourth International Conference on Language Resources and
Evaluation Available: Paper
in PDF
Shudong Huang, Stephanie
Strassel, Alexis Mitchell, Zhiyi Song
Shared Resources for Multilingual Information Extraction and Challenges in
Named Entity Annotation
IJCNLP-04 Workshop on Named Entity Recognition for NLP Applications, Hainan
Island, China, March 2004
Available: Paper in
PDF
Seth
Kulick, Ann Bies, Mark Liberman, Mark Mandel, Ryan McDonald, Martha Palmer,
Andrew Schein, Lyle Ungar, Scott Winters, Pete White
Integrated Annotation for Biomedical Information Extraction
Presented at HLT/NAACL Workshop BioLink 2004, Boston, May 2-7, 2004
Available: Paper
in PDF, Presentation
Slides
Mohamed Maamouri and Ann Bies
(2004)
Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools
Proceedings of the Workshop on Computational Approaches to Arabic Script-based
Languages, COLING 2004, Geneva, August 28, 2004.
Available: Paper
in PDF
Mohamed Maamouri, Tim Buckwalter,
and Christopher Cieri (2004)
Dialectal Arabic Telephone Speech Corpus: Principles, Tool Design, and
Transcription Conventions
Paper presented at the NEMLAR International Conference on Arabic Language
Resources and Tools, Cairo, Sept. 22-23, 2004.
Available: Paper
in PDF, Presentation
Slides.
Mohamed Maamouri, Ann Bies, Tim
Buckwalter, and Wigdan Mekki (2004)
The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus
Paper presented at the NEMLAR International Conference on Arabic Language
Resources and Tools, Cairo, Sept. 22-23, 2004.
Available: Paper in
PDF
Mohamed Maamouri, David Graff,
Hubert Jin, Christopher Cieri, and Tim Buckwalter (2004)
Dialectal Arabic Orthography-based Transcription and CTS Levantine Arabic
Collection.
Paper presented at the Parallel STT-NA Tracks Session of the EARS RT-04
Workshop, Palisades IBM Executive Center, New York, Nov. 10, 2004.
Available: Paper
in Word format
Ryan
McDonald, R. Scott Winters, Mark Mandel, Yang Jin, Peter S. White, Fernando
Pereira
An entity tagger for recognizing acquired genomic variations in cancer
literature
Bioinformatics 20:3249-3251
Available: Paper
in PDF
Kazuaki Maeda and Stephanie
Strassel (2004)
Annotation Tools for Large-Scale Corpus Development: Using AGTK at the
Linguistic Data Consortium.
LREC 2004: Fourth International Conference on Language Resources and
Evaluation
Available: Paper in PDF
Mike Maxwell
From Legacy Lexicon to Archivable Resource
First Steps for Language Documentation of Minority Languages: Workshop on
Computational Linguistic Tools for Morphology, Lexicon and Corpus Compilation, LREC
2004
Available: Paper
in PDF
Douglas Oard, Dagobert Soergel,
G. Craig Murray, David Doermann, Jianqiang Wang, Bhuvana Ramabhadran, Martin
Franz, James Mayfield and Samuel Gustman, Stephanie Strassel
Building an Information Retrieval Test Collection for Spontaneous
Conversational Speech
27th Annual International ACM SIGIR Conference (SIGIR2004), Sheffield,
England, July 2004
Available: Paper in
PDF
Stephanie Strassel
Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text
LREC 2004: Fourth International Conference on Language Resources and
Evaluation
Available: Paper in
PDF
Colin
Warner, Ann Bies, Christine Brisson, Justin Mott
Addendum to the Penn Treebank II Style Bracketing Guidelines: BioMedical
Treebank Annotation
November, 2004
Available: Paper
in PDF , Paper
as web page , Paper
in plain text
Steven Bird and Gary Simons
(2003)
Seven Dimensions of Portability for Language Documentation and Description
Language 79, 557-582.
Available: Paper in PDF
Steven Bird and Gary Simons
(2003)
Extending Dublin Core Metadata to support the description and discovery of
language resources
Computing and the Humanities 37, 375-388.
Available: Paper in PDF
Christopher Cieri, Stephanie
Strassel
Robust Sociolinguistic Methodology: Tools, Data and Best Practices
NWAV 32, Philadelphia, 2003
Available: Presentation
Slides
Christopher Cieri, Mike Maxwell,
Stephanie Strassel
Core Linguistic Resources for the World's Languages
ELSNET, ENABLER, ICWLR Joint Workshop, Paris, 2003
Available: Presentation
Slides
Baden Hughes and Steven Bird
(2003)
Grid-Enabling Natural Language Engineering By Stealth
Proceedings of the Workshop on The Software Engineering and Architecture of
Language Technology Systems (SEALTS)
Available: arXiv.org
Seth Kulick, Mark Liberman,
Martha Palmer, and Andrew Schein
Shallow Semantic Annotation of Biomedical Corpora for Information Extraction
ISMB Special Interest Group Meeting on Text Mining (BioLink). June 2003.
Brisbane, Australia
Available: Paper
in PDF , Presentation
slides
Mike Maxwell
Incremental Grammar Development using Finite State Tools
Proceedings of the Workshop on Finite-State Methods in Natural Language
Processing, EACL 10, Budapest, 13-14 April 2003. Available: Paper in PDF
Gary Simons and Steven Bird
(2003)
The Open Language Archives Community: An infrastructure for distributed
archiving of language resources
Literary and Linguistic Computing 18 (in press)
Available: arXiv.org
Gary Simons and Steven Bird
(2003)
Building an Open Language Archives Community on the OAI Foundation
Library Hi Tech 21, 210-218, Special Issue on Open Archives Initiative
Metadata Harvesting.
Available: Paper in PDF
Stephanie Strassel, David Miller,
Kevin Walker, Christopher Cieri (2003)
Shared Resources for Robust Speech-to-Text Technology
Eurospeech 2003
Available: Paper
in PDF
Stephanie Strassel (2003)
Corpus Creation for Disfluency Research
Disfluency in Spontaneous Speech Conference, Gothenburg, Sweden
Available: Abstract
in PDF, Presentation
Slides in PDF
Stephanie Strassel, Alexis
Mitchell, Shudong Huang (2003)
Multilingual Resources for Entity Extraction
41st Annual Meeting of the Association for Computational Linguistics
Workshop on Multilingual and Mixed-language Named Entity Recognition:
Combining Statistical and Symbolic Models, Sapporo Japan
Available: Paper
in PDF
Stephanie Strassel, Mike
Maxwell, Christopher Cieri (2003)
Linguistic Resource Creation for Research and Technology Development: A
Recent Experiment
Association for Computing Machinery Transactions on Asian Language
Information Processing (TALIP). Volume 2, Issue 2, 101 - 117
Available: Paper
in PDF
Steven Bird, Kazuaki Maeda,
Xiaoyi Ma, Haejoong Lee, Beth Randall, and Salim Zayat (2002)
TableTrans, MultiTrans, InterTrans and TreeTrans: Diverse Tools Built on the
Annotation Graph Toolkit
Proceedings of the Third International Conference on Language Resources and
Evaluation
Available: arXiv.org
Christopher Cieri, Stephanie
Strassel, David Graff, Nii Martey, Kara Rennert and Mark Liberman (2002)
Corpora for Topic Detection and Tracking
James Allan, ed. Topic Detection and Tracking: Event-based Information
Organization, Kluwer International Series on Information Retrieval, Bruce
Croft, series editor, Boston, Kluwer Academic Publishers.
Christopher Cieri, David Miller,
Kevin Walker (2002)
Research Methodologies, Observations and Outcomes in (Conversational) Speech
Data Collection
HLT 2002 The Human Language Technologies Conference, San Diego, CA,
March 2002
Available: Notebook
Paper.
Christopher Cieri, Stephanie
Strassel, William Labov
Sharable Resources for Sociolinguistic Research
NWAV31, Stanford, 2002
Available: Presentation
Slides
Scott Cotton and Steven Bird
(2002)
An Integrated Framework for Treebanks and Multilayer Annotations
Proceedings of the Third International Conference on Language Resources and
Evaluation
Available: arXiv.org
Xiaoyi Ma, Haejoong Lee, Steven
Bird and Kazuaki Maeda (2002)
Models and Tools for Collaborative Annotation
Proceedings of the Third International Conference on Language Resources and
Evaluation
Available: arXiv.org
Mohamed Maamouri, Christopher
Cieri
Resources for Arabic Natural Language Processing
International Symposium on Processing Arabic, Tunis, April 2002
Available: Presentation
Slides
Kazuaki Maeda, Steven Bird,
Xiaoyi Ma, and Haejoong Lee (2002)
Creating Annotation Tools with the Annotation Graph Toolkit
Proceedings of the Third International Conference on Language Resources and
Evaluation
Available: arXiv.org
Mike Maxwell, Gary Simons, and
Larry Hayashi (2002)
A Morphological Glossing Assistant
Proceedings of the International LREC Workshop on Resources and Tools in
Field Linguistics
Available: Paper
in PDF
Mike Maxwell (2002)
Resources for Morphology Learning and Evaluation
LREC 2002: Third International Conference on Language Resources and
Evaluation vol. III, 967-974
Available: Paper
in PDF
Christopher Cieri, David Graff,
David Miller, Kevin Walker (2001)
Resources and Infrastructure to Support Robust, Omnipresent
Communicator, SPINE, ROAR Workshop, Orlando, November 2001
Available: Presentation
Slides.
Christopher Cieri, Andy Cole,
Dave Graff, Nii Martey, Stephanie Strassel, Cristina Tofan (2001)
SPINE 2001 Data Preparation and Annotation and the SPINE Corpora
Communicator, SPINE, ROAR Workshop, Orlando, November 2001
Available: Presentation
Slides.
Christopher Cieri and Steven Bird
(2001)
Annotation Graphs, Annotation Servers and Multi-Modal Resources:
Infrastructure for Interdisciplinary Education, Research and Development
Proceedings of the Association for Computational Linguistics: Workshop on
Sharing Tools & Resources Toulouse, July 2001
Available: Paper in
PDF, Presentation
Slides.
Lea Christiansen, Christopher
Cieri, Kathleen Egan, Anita Kulman, Milton Paul (2001)
Getting SMART about Authoring
CALICO 2001, University of Central Florida, Orlando, March 2001
Available: Presentation
Slides.
David Miller, Christopher Cieri
and Kevin Walker (2001)
Switchboard Cellular Resources for Speaker Recognition
Speaker Recognition Workshop, Maritime Institute of Technology and
Graduate Studies, Linthicum MD, March 2001
Available: Presentation
Slides.
Stephanie Strassel, Christopher
Cieri and Steven Bird (2001)
Shared Resources and Community Building for Corpus Linguistics and Language
Teaching
Corpus Linguistics and Language Teaching Workshop Boston, MA., March
2001
Available: Presentation
Slides.
Stephanie Strassel and
Christopher Cieri
Data and Annotations for SocioLinguistics: A Corpus-Based Approach to
Sociolinguistic Research
Penn Linguistic Colloquium, Philadelphia, PA. March 2001
Available: Presentation
Slides.
Steven Bird and Mark Liberman
(2001)
A formal framework for linguistic annotation
Speech Communication 33(1,2), pp 23-60.
Available: arXiv.org
Steven Bird, Gary Simons and
Chu-Ren Huang (2001)
The Open Language Archives Community and Asian Language Resources
Proceedings of the Workshop on Language Resources in Asia, 6th Natural
Language Processing Pacific Rim Symposium (NLPRS), Tokyo, November 2001.
Available: arXiv.org
Steven Bird and Gary Simons
(2001)
The OLAC Metadata Set and Controlled Vocabularies Proceedings of the
ACL Workshop on Sharing Tools and Resources for Research and Education,
Toulouse, July 2001, pp 7-18.
Available: arXiv.org
Kazuaki Maeda, Steven Bird,
Xiaoyi Ma and Haejoong Lee (2001)
The Annotation Graph Toolkit: Software Components for Building Linguistic
Annotation Tools
Proceedings of HLT 2001 The Human Language Technologies Conference, San
Diego, CA, March 2001
Available: Paper in PDF
Kazuaki Maeda and Steven Bird
(2001)
A Framework for Annotating Animal Bioacoustic Data
The 142nd Meeting of the Acoustical Society of America, Chicago, June
2001
Available: Presentation
Slides (Powerpoint).
Steve Cassidy and Steven Bird
(2000)
Querying databases of annotated speech
Proceedings of the Eleventh Australasian Database Conference
Available: Paper in PDF
Christopher Cieri (2000)
Multiple Annotation of Reuseable Data Resources: Corpora for Topic Detection
and Tracking
In Rajman, M. and J. C. Chappelier, eds. (2000) Actes des 5es Journees
internationales d'analyse statistique des donnees textuelles, volume 1,
Ecole Polytechnique Federale de Lausanne
Available: Paper in
PDF
Christopher Cieri (2000)
Issues and Tools for Annotating a Corpus of Sociolinguistic Field Data
Linguistic Exploration Workshop in conjunction with
Linguistic Society of American Annual Meeting, Chicago, January 2000
Available: Presentation Slides
Christopher Cieri, David Graff,
Nii Martey, Stephanie Strassel (2000)
The TDT-3 Text and Speech Corpus
Presented at the Topic Detection and Tracking Workshop, Vienna,
Virginia, February 28 - March 1, 2000.
Available: Paper in
PostScript
Christopher Cieri, Dave Graff,
Mark Liberman, Nii Martey and Stephanie Strassel (2000)
Large Multilingual Broadcast News Corpora for Cooperative Research in Topic
Detection and Tracking: The TDT2 and TDT3 Corpus Efforts
In Proceedings of the Second International Language Resources and Evaluation
Conference, Athens, Greece, May 2000.
Available: Paper
in PDF
Christopher Cieri and Mark
Liberman (2000)
Issues in Corpus Creation and Distribution: the Evolution of the Linguistic
Data Consortium
In Proceedings of the Second International Language Resources and Evaluation
Conference, Athens, Greece, May 2000.
Available: Paper
in PDF
David Graff and Steven Bird
(2000)
Many uses, many annotations for large speech corpora: Switchboard and TDT as
case studies
2nd Language Resources and Evaluation Conference (LREC 2000) Athens,
Greece, May 2000
Available: Paper in
PDF -- Paper in PostScript
Dave Graff, Stephanie Strassel
and Christopher Cieri (2000)
Resources, New and Forthcoming, from LDC
Presented at the 2000 Speech Transcription Workshop, University of
Maryland, May 16-19, 2000.
Available: Presentation Slides
Stephanie Strassel, Dave Graff,
Nii Martey and Christopher Cieri (2000)
Quality Control in Large Annotation Projects Involving Multiple Judges: The
Case of the TDT Corpora.
In Proceedings of the Second International Language Resources and Evaluation
Conference, Athens, Greece, May 2000.
Available: Paper
in PDF
Steven Bird and Mark Liberman
(1999)
A Formal Framework for Linguistic Annotation
Technical Report MS-CIS-99-01 - Department of Computer and Information
Science, University of Pennsylvania
(expanded from version presented at ICSLP-98, Sydney)
Available: Paper in
PDF
Steven Bird and Mark Liberman
(1999)
Annotation graphs as a framework for multidimensional linguistic data
analysis
Towards Standards and Tools for Discourse Tagging -- Proceedings of the
Workshop, Somerset, NJ: Association for Computational Linguistics
Available: Paper in PDF
Steven Bird (1999)
Multidimensional exploration of online linguistic field data
Proceedings of the 29th Annual Meeting of the Northeast Linguistics Society,
University of Massachussetts at Amherst.
Available: Paper in
PDF
Steven Bird and Stephanie
Strassel (1999)
Annotated Corpora in Linguistic Research
North American Symposium on Corpora in Linguistics and Language Teaching,
University of Michigan, May 21, 1999.
Available: Presentation Slides
Alexandra Canavan, Kevin Walker,
David Graff and Christopher Cieri (1999)
Telephone Speech Corpora: New Needs, Languages, Methods and Technology
Presented at the Hub-5 Conversational Speech Understanding (LVCSR) Workshop,
Maritime Institute Technology and Graduate Studies, Linthicum Heights,
Maryland, June 1999.
Available: Presentation Slides
Christopher Cieri, David Graff,
Mark Liberman, Nii Martey, Stephanie Strassel (1999)
The TDT-2 Text and Speech Corpus
Presented at the DARPA Broadcast News Workshop, Washington, DC.,
February 1999.
Available: Paper
in PDF
Christopher Cieri (1999)
This Ain't Your Father's Digital Data: Another Perspective on Legal
Information
Presented at the CALI 1999 - The Conference for Law School Computing.
Eugene Oregon, June 1999.
Available: Presentation Slides,
Video in RealMedia
Xiaoyi Ma and Mark Liberman
(1999)
Machine Translation Summit VII, September 13th, 1999, Kent Ridge Digital
Labs, National University of Singapore
Available: Paper in
Postscript, Paper
in PDF
Xiaoyi Ma (1999)
Parallel Text Collections at the Linguistic Data Consortium
Machine Translation Summit
Available: Paper
in Postscript
Stephanie Strassel (1999)
Corpus Creation and Quality Control at the LDC
Presented at the Corpus of Spoken Dutch Workshop; Tilburg, Netherlands;
November 12, 1999.
Available: Presentation Slides
Stephanie Strassel and
Christopher Cieri (1999)
Corpus Sociolinguistics: Issues, Data and Tools
Presented at NWAVE-28, York University, Toronto, Ontario October, 1999.
Available: Presentation Slides
Steven Bird and Mark Liberman
(1998)
Towards a Formal Framework for Linguistic Annotations
Proceedings of the 5th International Conference on Spoken Language
Processing.
Available: Paper in
PDF
Christopher Cieri and David Graff
(1998)
Topic Detection and Tracking Corpora
Presented at TREC/SDR Conference, Gaithesburg Maryland, November 1998.
Available:
David Graff and Christopher Cieri
(1998)
Update on Lexical Resources and Projects at the Linguistic Data Consortium
Presented at the Ninth Hub-5 Conversational Speech Recognition (LVCSR)
Workshop, Maritime Institute Technology and Graduate Studies, Linthicum
Heights, Maryland, September 1998.
Available:
Mark Liberman and Christopher
Cieri (1998)
The Creation, Distribution and Use of Linguistic Data
Proceedings of the First International Conference on Language Resources and
Evaluation, Granada, Spain, May 1998.
Available: Paper in
PDF