Showing
1 changed file
with
7 additions
and
7 deletions
1 | -# Automatic analysis of morphological units: segmentation and clustering of Spanish, Maya and Nahuatl | 1 | +## Automatic analysis of morphological units: segmentation and clustering of Spanish, Maya and Nahuatl |
2 | -## Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández | 2 | +### Carlos-Francisco Méndez-Cruz and Ignacio Arroyo-Fernández |
3 | 3 | ||
4 | In this repository, results of two automatic morphological | 4 | In this repository, results of two automatic morphological |
5 | analyzes for Spanish, Nahuatl and Maya are shown. | 5 | analyzes for Spanish, Nahuatl and Maya are shown. |
... | @@ -18,23 +18,23 @@ conclude that the word embeddings represented the contextual | ... | @@ -18,23 +18,23 @@ conclude that the word embeddings represented the contextual |
18 | information necessary to differentiate them from morphs with | 18 | information necessary to differentiate them from morphs with |
19 | lexical-semantic content. | 19 | lexical-semantic content. |
20 | 20 | ||
21 | -# Directory description | 21 | +### Corpora |
22 | +`\corpora` | ||
22 | 23 | ||
23 | -## Corpora | ||
24 | Only a sample of documents employed in our study. | 24 | Only a sample of documents employed in our study. |
25 | Complete versions must be request by e-mail (see **Contact**). | 25 | Complete versions must be request by e-mail (see **Contact**). |
26 | 26 | ||
27 | -## Segmentation | 27 | +### Segmentation |
28 | Segmented corpus for each language. | 28 | Segmented corpus for each language. |
29 | Maya and Nahuatl were segmented using _Morfessor CatMap_ | 29 | Maya and Nahuatl were segmented using _Morfessor CatMap_ |
30 | (http://www.cis.hut.fi/projects/morpho/). | 30 | (http://www.cis.hut.fi/projects/morpho/). |
31 | Spanish was segmented by the authors. | 31 | Spanish was segmented by the authors. |
32 | 32 | ||
33 | -## Clustering | 33 | +### Clustering |
34 | Clusters of morphs for each language: | 34 | Clusters of morphs for each language: |
35 | 500 groups for Maya and Nahuatl, 1000 groups for Spanish. | 35 | 500 groups for Maya and Nahuatl, 1000 groups for Spanish. |
36 | 36 | ||
37 | -## Contact | 37 | +### Contact |
38 | Carlos Méndez (cmendezc at ccg dot unam dot mx) | 38 | Carlos Méndez (cmendezc at ccg dot unam dot mx) |
39 | 39 | ||
40 | Center for Genomic Sciences, UNAM, Mexico | 40 | Center for Genomic Sciences, UNAM, Mexico | ... | ... |
-
Please register or login to post a comment