A Computational Approach to the Discovery and Representation of Lexical Chunks

David Wible; Chin-Hwa Kuo; Meng-Chang Chen; Nai-Lung Tsao; Tsung-Fu Hung

Communication Dans Un Congrès Année : 2006

A Computational Approach to the Discovery and Representation of Lexical Chunks

(1) , (2) , (3) , (3) , (2)

1
2
3

David Wible

Fonction : Auteur

Department of English

Chin-Hwa Kuo

Fonction : Auteur

Computer and Network Lab

Meng-Chang Chen

Fonction : Auteur

Institute of Information Science

Nai-Lung Tsao

Fonction : Auteur

Institute of Information Science

Tsung-Fu Hung

Fonction : Auteur

Computer and Network Lab

Résumé

Lexical chunks have in recent years become widely recognized as a crucial aspect of second language competence. We address two major sorts of challenge that chunks pose for lexicography and describe computational approaches to addressing these challenges. The first challenge is lexical knowledge discovery, that is, the need to uncover which strings of words constitute chunks worthy of learners' attention. The second challenge is the problem of representation, that is, how such knowledge can be made accessible to learners. To address the first challenge, we propose a greedy algorithm run on 20-million words of BNC that iterates applications of word association measures on increasingly longer n-grams. This approach places priority on high recall and then attempts to isolate false positives by sorting mechanisms. To address the challenge of representation we propose embedding the algorithm in a browser-based agent as an extension of our current browser-based collocation detection tool. Résumé: La connaissance des « chunks » (tronçons) lexicaux est maintenant reconnue comme une compétence essentielle pour l'apprentissage d'une seconde langue. Nous étudions deux des principaux problèmes que les « chunks » posent en lexicographie et nous présentons des méthodes de résolution informatiques. Le premier problème est celui de l'apprentissage de connaissances lexicales, c'est-à dire la nécessité de définir quelles suites de mots constituent des « chunks » utiles à l'apprenant. Le deuxième problème est celui de la représentation, c'est-à-dire comment mettre cette connaissance à la disposition de l'apprenant. Pour résoudre le premier problème, nous proposons un algorithme glouton exécuté sur un corpus de 20 millions de mots du BNC qui reproduit des mesures d'associations de mot sur des n-grams de plus en plus longs. Cette approche donne la priorité à un rappel élevé et tente d'isoler les faux positifs à l'aide de mécanismes de tri. Pour résoudre le problème de la représentation, nous nous proposons d'associer cet algorithme à un navigateur en tant qu'extension de notre outil de détection de collocations.

Mots clés

foreign language learning lexical chunks computational lexicography word associations

Domaines

Environnements Informatiques pour l'Apprentissage Humain

Fichier principal

Wible-david-2006.pdf (454.7 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Jerome Zeiliger : Connectez-vous pour contacter le contributeur

https://telearn.hal.science/hal-00197301

Soumis le : vendredi 14 décembre 2007-14:58:42

Dernière modification le : samedi 8 mai 2021-22:04:26

Archivage à long terme le : lundi 12 avril 2010-07:45:44

Dates et versions

hal-00197301 , version 1 (14-12-2007)

Identifiants

HAL Id : hal-00197301 , version 1

Citer

David Wible, Chin-Hwa Kuo, Meng-Chang Chen, Nai-Lung Tsao, Tsung-Fu Hung. A Computational Approach to the Discovery and Representation of Lexical Chunks. The 13th Conference on Natural Language Processing (TALN 2006). April 10-13, 2006. Leuven (Belgium), 2006, Leuven, Belgium. pp.868-875. ⟨hal-00197301⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

TELEARN TICE

211 Consultations

453 Téléchargements

A Computational Approach to the Discovery and Representation of Lexical Chunks

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager