Showing
1 changed file
with
11 additions
and
7 deletions
| 1 | ## Automatic analysis of morphological units: segmentation and clustering of Spanish, Maya and Nahuatl | 1 | ## Automatic analysis of morphological units: segmentation and clustering of Spanish, Maya and Nahuatl |
| 2 | -### Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández | 2 | + |
| 3 | +Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández | ||
| 3 | 4 | ||
| 4 | In this repository, results of two automatic morphological | 5 | In this repository, results of two automatic morphological |
| 5 | analyzes for Spanish, Nahuatl and Maya are shown. | 6 | analyzes for Spanish, Nahuatl and Maya are shown. |
| ... | @@ -18,6 +19,12 @@ conclude that the word embeddings represented the contextual | ... | @@ -18,6 +19,12 @@ conclude that the word embeddings represented the contextual |
| 18 | information necessary to differentiate them from morphs with | 19 | information necessary to differentiate them from morphs with |
| 19 | lexical-semantic content. | 20 | lexical-semantic content. |
| 20 | 21 | ||
| 22 | +### Clustering | ||
| 23 | +`/clustering` | ||
| 24 | + | ||
| 25 | +Clusters of morphs for each language: | ||
| 26 | +500 groups for Maya and Nahuatl, 1000 groups for Spanish. | ||
| 27 | + | ||
| 21 | ### Corpora | 28 | ### Corpora |
| 22 | `\corpora` | 29 | `\corpora` |
| 23 | 30 | ||
| ... | @@ -25,17 +32,14 @@ Only a sample of documents employed in our study. | ... | @@ -25,17 +32,14 @@ Only a sample of documents employed in our study. |
| 25 | Complete versions must be request by e-mail (see **Contact**). | 32 | Complete versions must be request by e-mail (see **Contact**). |
| 26 | 33 | ||
| 27 | ### Segmentation | 34 | ### Segmentation |
| 35 | +`/segmentation` | ||
| 36 | + | ||
| 28 | Segmented corpus for each language. | 37 | Segmented corpus for each language. |
| 29 | Maya and Nahuatl were segmented using _Morfessor CatMap_ | 38 | Maya and Nahuatl were segmented using _Morfessor CatMap_ |
| 30 | (http://www.cis.hut.fi/projects/morpho/). | 39 | (http://www.cis.hut.fi/projects/morpho/). |
| 31 | Spanish was segmented by the authors. | 40 | Spanish was segmented by the authors. |
| 32 | 41 | ||
| 33 | -### Clustering | ||
| 34 | -Clusters of morphs for each language: | ||
| 35 | -500 groups for Maya and Nahuatl, 1000 groups for Spanish. | ||
| 36 | 42 | ||
| 37 | ### Contact | 43 | ### Contact |
| 38 | -Carlos Méndez (cmendezc at ccg dot unam dot mx) | 44 | +Carlos Méndez (cmendezc at ccg dot unam dot mx), Center for Genomic Sciences, UNAM, Mexico |
| 39 | - | ||
| 40 | -Center for Genomic Sciences, UNAM, Mexico | ||
| 41 | 45 | ... | ... |
-
Please register or login to post a comment