carlosmendeznlp

Merge branch 'master' of http://pakal.ccg.unam.mx/cmendezc/sentence-simplification

Add README
Showing 1 changed file with 31 additions and 22 deletions
1 -# Automatic extraction of gene - disease events 1 +# Sentence simplification with iSimp and Daniel Gutiérrez's algorithm
2 -## Input data sets (original corpus given by Yalbi Balderas) 2 +
3 +## Directories
4 +### iSimp
3 ```Shell 5 ```Shell
4 -\input-data-sets 6 +/isimp_v2
5 ``` 7 ```
6 - 8 +### Temporal iSimp files with constructs
7 -## Dictionaries
8 -### Original given by Yalbi Balderas
9 ```Shell 9 ```Shell
10 -\dictionaries-original 10 +/iSimp_sentences
11 ``` 11 ```
12 - 12 +### Final simplified sentences
13 -## Terminological resources (dictionaries for entity recognition)
14 ```Shell 13 ```Shell
15 -\terminologicalResources 14 +/algorithm_sentences
16 ``` 15 ```
17 - 16 +### Cleaned sentences
18 -## JSON dictionaries (Gene, disease, effect and GO)
19 ```Shell 17 ```Shell
20 -\dictionaries-json 18 +format/sanitized_sentences
21 ``` 19 ```
22 -## Example sentences 20 +### Separated sentences one per file
23 -### Example sentences with different tags (non redundant) 21 +```Shell
22 +format/split_sentences
23 +```
24 +
25 +## Scripts
26 +### Clean sentences for iSimp
24 ```Shell 27 ```Shell
25 -\example-sentences 28 +Usage: ./format/regex.py <input_file_path> <output_file_path>
29 +./format/regex.py ./input-sentences/input-sentences.txt ./format/sanitized_sentences/input-sentences.txt
26 ``` 30 ```
27 -## Corpus 31 +### Main shell script for sentence simplification
28 -### Set of abstracts and full article sentence-splitted
29 ```Shell 32 ```Shell
30 -\corpora 33 +./sentence-simplification-main.sh
34 +Usage: ./sentence-simplification-main.sh <input_path> <output_file_path>
35 +./sentence-simplification/sentence-simplification-main.sh ./format/split_sentences ./algorithm_sentences/filename.txt
36 +<input_path> Path for cleaned and separated sentences, one per file.
37 +<output_file_path> Path and filename. It uses filename to create files with simplified sentences and with an index within the filename.
38 +**Requirements**: senteces must be separated one per file and they must be cleaned.
39 +It calls simplifier.py
31 ``` 40 ```
32 -## scripts 41 +### Python scritp for sentences simplification
33 -### Scripts used to tag and preprocess text
34 ```Shell 42 ```Shell
35 -\scripts 43 +simplifier.py
44 +It is called by sentence-simplification-main.sh
36 ``` 45 ```
......