Showing
1 changed file
with
11 additions
and
7 deletions
1 | ## Automatic analysis of morphological units: segmentation and clustering of Spanish, Maya and Nahuatl | 1 | ## Automatic analysis of morphological units: segmentation and clustering of Spanish, Maya and Nahuatl |
2 | -### Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández | 2 | + |
3 | +Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández | ||
3 | 4 | ||
4 | In this repository, results of two automatic morphological | 5 | In this repository, results of two automatic morphological |
5 | analyzes for Spanish, Nahuatl and Maya are shown. | 6 | analyzes for Spanish, Nahuatl and Maya are shown. |
... | @@ -18,6 +19,12 @@ conclude that the word embeddings represented the contextual | ... | @@ -18,6 +19,12 @@ conclude that the word embeddings represented the contextual |
18 | information necessary to differentiate them from morphs with | 19 | information necessary to differentiate them from morphs with |
19 | lexical-semantic content. | 20 | lexical-semantic content. |
20 | 21 | ||
22 | +### Clustering | ||
23 | +`/clustering` | ||
24 | + | ||
25 | +Clusters of morphs for each language: | ||
26 | +500 groups for Maya and Nahuatl, 1000 groups for Spanish. | ||
27 | + | ||
21 | ### Corpora | 28 | ### Corpora |
22 | `\corpora` | 29 | `\corpora` |
23 | 30 | ||
... | @@ -25,17 +32,14 @@ Only a sample of documents employed in our study. | ... | @@ -25,17 +32,14 @@ Only a sample of documents employed in our study. |
25 | Complete versions must be request by e-mail (see **Contact**). | 32 | Complete versions must be request by e-mail (see **Contact**). |
26 | 33 | ||
27 | ### Segmentation | 34 | ### Segmentation |
35 | +`/segmentation` | ||
36 | + | ||
28 | Segmented corpus for each language. | 37 | Segmented corpus for each language. |
29 | Maya and Nahuatl were segmented using _Morfessor CatMap_ | 38 | Maya and Nahuatl were segmented using _Morfessor CatMap_ |
30 | (http://www.cis.hut.fi/projects/morpho/). | 39 | (http://www.cis.hut.fi/projects/morpho/). |
31 | Spanish was segmented by the authors. | 40 | Spanish was segmented by the authors. |
32 | 41 | ||
33 | -### Clustering | ||
34 | -Clusters of morphs for each language: | ||
35 | -500 groups for Maya and Nahuatl, 1000 groups for Spanish. | ||
36 | 42 | ||
37 | ### Contact | 43 | ### Contact |
38 | -Carlos Méndez (cmendezc at ccg dot unam dot mx) | 44 | +Carlos Méndez (cmendezc at ccg dot unam dot mx), Center for Genomic Sciences, UNAM, Mexico |
39 | - | ||
40 | -Center for Genomic Sciences, UNAM, Mexico | ||
41 | 45 | ... | ... |
-
Please register or login to post a comment