Showing
1 changed file
with
7 additions
and
7 deletions
| 1 | -# Automatic analysis of morphological units: segmentation and clustering of Spanish, Maya and Nahuatl | 1 | +## Automatic analysis of morphological units: segmentation and clustering of Spanish, Maya and Nahuatl |
| 2 | -## Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández | 2 | +### Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández |
| 3 | 3 | ||
| 4 | In this repository, results of two automatic morphological | 4 | In this repository, results of two automatic morphological |
| 5 | analyzes for Spanish, Nahuatl and Maya are shown. | 5 | analyzes for Spanish, Nahuatl and Maya are shown. |
| ... | @@ -18,23 +18,23 @@ conclude that the word embeddings represented the contextual | ... | @@ -18,23 +18,23 @@ conclude that the word embeddings represented the contextual |
| 18 | information necessary to differentiate them from morphs with | 18 | information necessary to differentiate them from morphs with |
| 19 | lexical-semantic content. | 19 | lexical-semantic content. |
| 20 | 20 | ||
| 21 | -# Directory description | 21 | +### Corpora |
| 22 | +`\corpora` | ||
| 22 | 23 | ||
| 23 | -## Corpora | ||
| 24 | Only a sample of documents employed in our study. | 24 | Only a sample of documents employed in our study. |
| 25 | Complete versions must be request by e-mail (see **Contact**). | 25 | Complete versions must be request by e-mail (see **Contact**). |
| 26 | 26 | ||
| 27 | -## Segmentation | 27 | +### Segmentation |
| 28 | Segmented corpus for each language. | 28 | Segmented corpus for each language. |
| 29 | Maya and Nahuatl were segmented using _Morfessor CatMap_ | 29 | Maya and Nahuatl were segmented using _Morfessor CatMap_ |
| 30 | (http://www.cis.hut.fi/projects/morpho/). | 30 | (http://www.cis.hut.fi/projects/morpho/). |
| 31 | Spanish was segmented by the authors. | 31 | Spanish was segmented by the authors. |
| 32 | 32 | ||
| 33 | -## Clustering | 33 | +### Clustering |
| 34 | Clusters of morphs for each language: | 34 | Clusters of morphs for each language: |
| 35 | 500 groups for Maya and Nahuatl, 1000 groups for Spanish. | 35 | 500 groups for Maya and Nahuatl, 1000 groups for Spanish. |
| 36 | 36 | ||
| 37 | -## Contact | 37 | +### Contact |
| 38 | Carlos Méndez (cmendezc at ccg dot unam dot mx) | 38 | Carlos Méndez (cmendezc at ccg dot unam dot mx) |
| 39 | 39 | ||
| 40 | Center for Genomic Sciences, UNAM, Mexico | 40 | Center for Genomic Sciences, UNAM, Mexico | ... | ... |
-
Please register or login to post a comment