Carlos-Francisco Méndez-Cruz

README

Showing 1 changed file with 11 additions and 7 deletions
1 ## Automatic analysis of morphological units: segmentation and clustering of Spanish, Maya and Nahuatl 1 ## Automatic analysis of morphological units: segmentation and clustering of Spanish, Maya and Nahuatl
2 -### Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández 2 +
3 +Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández
3 4
4 In this repository, results of two automatic morphological 5 In this repository, results of two automatic morphological
5 analyzes for Spanish, Nahuatl and Maya are shown. 6 analyzes for Spanish, Nahuatl and Maya are shown.
...@@ -18,6 +19,12 @@ conclude that the word embeddings represented the contextual ...@@ -18,6 +19,12 @@ conclude that the word embeddings represented the contextual
18 information necessary to differentiate them from morphs with 19 information necessary to differentiate them from morphs with
19 lexical-semantic content. 20 lexical-semantic content.
20 21
22 +### Clustering
23 +`/clustering`
24 +
25 +Clusters of morphs for each language:
26 +500 groups for Maya and Nahuatl, 1000 groups for Spanish.
27 +
21 ### Corpora 28 ### Corpora
22 `\corpora` 29 `\corpora`
23 30
...@@ -25,17 +32,14 @@ Only a sample of documents employed in our study. ...@@ -25,17 +32,14 @@ Only a sample of documents employed in our study.
25 Complete versions must be request by e-mail (see **Contact**). 32 Complete versions must be request by e-mail (see **Contact**).
26 33
27 ### Segmentation 34 ### Segmentation
35 +`/segmentation`
36 +
28 Segmented corpus for each language. 37 Segmented corpus for each language.
29 Maya and Nahuatl were segmented using _Morfessor CatMap_ 38 Maya and Nahuatl were segmented using _Morfessor CatMap_
30 (http://www.cis.hut.fi/projects/morpho/). 39 (http://www.cis.hut.fi/projects/morpho/).
31 Spanish was segmented by the authors. 40 Spanish was segmented by the authors.
32 41
33 -### Clustering
34 -Clusters of morphs for each language:
35 -500 groups for Maya and Nahuatl, 1000 groups for Spanish.
36 42
37 ### Contact 43 ### Contact
38 -Carlos Méndez (cmendezc at ccg dot unam dot mx) 44 +Carlos Méndez (cmendezc at ccg dot unam dot mx), Center for Genomic Sciences, UNAM, Mexico
39 -
40 -Center for Genomic Sciences, UNAM, Mexico
41 45
......