Carlos-Francisco Méndez-Cruz

README

Showing 1 changed file with 11 additions and 7 deletions
## Automatic analysis of morphological units: segmentation and clustering of Spanish, Maya and Nahuatl
### Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández
Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández
In this repository, results of two automatic morphological
analyzes for Spanish, Nahuatl and Maya are shown.
......@@ -18,6 +19,12 @@ conclude that the word embeddings represented the contextual
information necessary to differentiate them from morphs with
lexical-semantic content.
### Clustering
`/clustering`
Clusters of morphs for each language:
500 groups for Maya and Nahuatl, 1000 groups for Spanish.
### Corpora
`\corpora`
......@@ -25,17 +32,14 @@ Only a sample of documents employed in our study.
Complete versions must be request by e-mail (see **Contact**).
### Segmentation
`/segmentation`
Segmented corpus for each language.
Maya and Nahuatl were segmented using _Morfessor CatMap_
(http://www.cis.hut.fi/projects/morpho/).
Spanish was segmented by the authors.
### Clustering
Clusters of morphs for each language:
500 groups for Maya and Nahuatl, 1000 groups for Spanish.
### Contact
Carlos Méndez (cmendezc at ccg dot unam dot mx)
Center for Genomic Sciences, UNAM, Mexico
Carlos Méndez (cmendezc at ccg dot unam dot mx), Center for Genomic Sciences, UNAM, Mexico
......