Merge branch 'master' of http://pakal.ccg.unam.mx/cmendezc/sentence-simplification
Add README
Showing
1 changed file
with
32 additions
and
23 deletions
| 1 | -# Automatic extraction of gene - disease events | 1 | +# Sentence simplification with iSimp and Daniel Gutiérrez's algorithm |
| 2 | -## Input data sets (original corpus given by Yalbi Balderas) | 2 | + |
| 3 | +## Directories | ||
| 4 | +### iSimp | ||
| 3 | ```Shell | 5 | ```Shell |
| 4 | -\input-data-sets | 6 | +/isimp_v2 |
| 5 | ``` | 7 | ``` |
| 6 | - | 8 | +### Temporal iSimp files with constructs |
| 7 | -## Dictionaries | ||
| 8 | -### Original given by Yalbi Balderas | ||
| 9 | ```Shell | 9 | ```Shell |
| 10 | -\dictionaries-original | 10 | +/iSimp_sentences |
| 11 | ``` | 11 | ``` |
| 12 | - | 12 | +### Final simplified sentences |
| 13 | -## Terminological resources (dictionaries for entity recognition) | ||
| 14 | ```Shell | 13 | ```Shell |
| 15 | -\terminologicalResources | 14 | +/algorithm_sentences |
| 16 | ``` | 15 | ``` |
| 17 | - | 16 | +### Cleaned sentences |
| 18 | -## JSON dictionaries (Gene, disease, effect and GO) | ||
| 19 | ```Shell | 17 | ```Shell |
| 20 | -\dictionaries-json | 18 | +format/sanitized_sentences |
| 21 | ``` | 19 | ``` |
| 22 | -## Example sentences | 20 | +### Separated sentences one per file |
| 23 | -### Example sentences with different tags (non redundant) | ||
| 24 | ```Shell | 21 | ```Shell |
| 25 | -\example-sentences | 22 | +format/split_sentences |
| 26 | ``` | 23 | ``` |
| 27 | -## Corpus | 24 | + |
| 28 | -### Set of abstracts and full article sentence-splitted | 25 | +## Scripts |
| 26 | +### Clean sentences for iSimp | ||
| 27 | +```Shell | ||
| 28 | +Usage: ./format/regex.py <input_file_path> <output_file_path> | ||
| 29 | +./format/regex.py ./input-sentences/input-sentences.txt ./format/sanitized_sentences/input-sentences.txt | ||
| 30 | +``` | ||
| 31 | +### Main shell script for sentence simplification | ||
| 29 | ```Shell | 32 | ```Shell |
| 30 | -\corpora | 33 | +./sentence-simplification-main.sh |
| 34 | +Usage: ./sentence-simplification-main.sh <input_path> <output_file_path> | ||
| 35 | +./sentence-simplification/sentence-simplification-main.sh ./format/split_sentences ./algorithm_sentences/filename.txt | ||
| 36 | +<input_path> Path for cleaned and separated sentences, one per file. | ||
| 37 | +<output_file_path> Path and filename. It uses filename to create files with simplified sentences and with an index within the filename. | ||
| 38 | +**Requirements**: senteces must be separated one per file and they must be cleaned. | ||
| 39 | +It calls simplifier.py | ||
| 31 | ``` | 40 | ``` |
| 32 | -## scripts | 41 | +### Python scritp for sentences simplification |
| 33 | -### Scripts used to tag and preprocess text | ||
| 34 | ```Shell | 42 | ```Shell |
| 35 | -\scripts | ||
| 36 | -``` | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
| 43 | +simplifier.py | ||
| 44 | +It is called by sentence-simplification-main.sh | ||
| 45 | +``` | ... | ... |
-
Please register or login to post a comment