Merge branch 'master' of http://pakal.ccg.unam.mx/cmendezc/sentence-simplification
Add README
Showing
1 changed file
with
32 additions
and
23 deletions
1 | -# Automatic extraction of gene - disease events | 1 | +# Sentence simplification with iSimp and Daniel Gutiérrez's algorithm |
2 | -## Input data sets (original corpus given by Yalbi Balderas) | 2 | + |
3 | +## Directories | ||
4 | +### iSimp | ||
3 | ```Shell | 5 | ```Shell |
4 | -\input-data-sets | 6 | +/isimp_v2 |
5 | ``` | 7 | ``` |
6 | - | 8 | +### Temporal iSimp files with constructs |
7 | -## Dictionaries | ||
8 | -### Original given by Yalbi Balderas | ||
9 | ```Shell | 9 | ```Shell |
10 | -\dictionaries-original | 10 | +/iSimp_sentences |
11 | ``` | 11 | ``` |
12 | - | 12 | +### Final simplified sentences |
13 | -## Terminological resources (dictionaries for entity recognition) | ||
14 | ```Shell | 13 | ```Shell |
15 | -\terminologicalResources | 14 | +/algorithm_sentences |
16 | ``` | 15 | ``` |
17 | - | 16 | +### Cleaned sentences |
18 | -## JSON dictionaries (Gene, disease, effect and GO) | ||
19 | ```Shell | 17 | ```Shell |
20 | -\dictionaries-json | 18 | +format/sanitized_sentences |
21 | ``` | 19 | ``` |
22 | -## Example sentences | 20 | +### Separated sentences one per file |
23 | -### Example sentences with different tags (non redundant) | ||
24 | ```Shell | 21 | ```Shell |
25 | -\example-sentences | 22 | +format/split_sentences |
26 | ``` | 23 | ``` |
27 | -## Corpus | 24 | + |
28 | -### Set of abstracts and full article sentence-splitted | 25 | +## Scripts |
26 | +### Clean sentences for iSimp | ||
27 | +```Shell | ||
28 | +Usage: ./format/regex.py <input_file_path> <output_file_path> | ||
29 | +./format/regex.py ./input-sentences/input-sentences.txt ./format/sanitized_sentences/input-sentences.txt | ||
30 | +``` | ||
31 | +### Main shell script for sentence simplification | ||
29 | ```Shell | 32 | ```Shell |
30 | -\corpora | 33 | +./sentence-simplification-main.sh |
34 | +Usage: ./sentence-simplification-main.sh <input_path> <output_file_path> | ||
35 | +./sentence-simplification/sentence-simplification-main.sh ./format/split_sentences ./algorithm_sentences/filename.txt | ||
36 | +<input_path> Path for cleaned and separated sentences, one per file. | ||
37 | +<output_file_path> Path and filename. It uses filename to create files with simplified sentences and with an index within the filename. | ||
38 | +**Requirements**: senteces must be separated one per file and they must be cleaned. | ||
39 | +It calls simplifier.py | ||
31 | ``` | 40 | ``` |
32 | -## scripts | 41 | +### Python scritp for sentences simplification |
33 | -### Scripts used to tag and preprocess text | ||
34 | ```Shell | 42 | ```Shell |
35 | -\scripts | ||
36 | -``` | ||
... | \ No newline at end of file | ... | \ No newline at end of file |
43 | +simplifier.py | ||
44 | +It is called by sentence-simplification-main.sh | ||
45 | +``` | ... | ... |
-
Please register or login to post a comment