Carlos-Francisco Méndez-Cruz

README

# Automatic analysis of morphological units: segmentation and clustering of Spanish, Maya and Nahuatl
## Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández
## Automatic analysis of morphological units: segmentation and clustering of Spanish, Maya and Nahuatl
### Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández
In this repository, results of two automatic morphological
analyzes for Spanish, Nahuatl and Maya are shown.
......@@ -18,23 +18,23 @@ conclude that the word embeddings represented the contextual
information necessary to differentiate them from morphs with
lexical-semantic content.
# Directory description
### Corpora
`\corpora`
## Corpora
Only a sample of documents employed in our study.
Complete versions must be request by e-mail (see **Contact**).
## Segmentation
### Segmentation
Segmented corpus for each language.
Maya and Nahuatl were segmented using _Morfessor CatMap_
(http://www.cis.hut.fi/projects/morpho/).
Spanish was segmented by the authors.
## Clustering
### Clustering
Clusters of morphs for each language:
500 groups for Maya and Nahuatl, 1000 groups for Spanish.
## Contact
### Contact
Carlos Méndez (cmendezc at ccg dot unam dot mx)
Center for Genomic Sciences, UNAM, Mexico
......