Carlos-Francisco Méndez-Cruz

Update README.md

Showing 1 changed file with 32 additions and 23 deletions
# Automatic extraction of gene - disease events
## Input data sets (original corpus given by Yalbi Balderas)
# Sentence simplification with iSimp and Daniel Gutiérrez's algorithm
## Directories
### iSimp
```Shell
\input-data-sets
/isimp_v2
```
## Dictionaries
### Original given by Yalbi Balderas
### Temporal iSimp files with constructs
```Shell
\dictionaries-original
/iSimp_sentences
```
## Terminological resources (dictionaries for entity recognition)
### Final simplified sentences
```Shell
\terminologicalResources
/algorithm_sentences
```
## JSON dictionaries (Gene, disease, effect and GO)
### Cleaned sentences
```Shell
\dictionaries-json
format/sanitized_sentences
```
## Example sentences
### Example sentences with different tags (non redundant)
### Separated sentences one per file
```Shell
\example-sentences
format/split_sentences
```
## Corpus
### Set of abstracts and full article sentence-splitted
## Scripts
### Clean sentences for iSimp
```Shell
Usage: ./format/regex.py <input_file_path> <output_file_path>
./format/regex.py ./input-sentences/input-sentences.txt ./format/sanitized_sentences/input-sentences.txt
```
### Main shell script for sentence simplification
```Shell
\corpora
./sentence-simplification-main.sh
Usage: ./sentence-simplification-main.sh <input_path> <output_file_path>
./sentence-simplification/sentence-simplification-main.sh ./format/split_sentences ./algorithm_sentences/filename.txt
<input_path> Path for cleaned and separated sentences, one per file.
<output_file_path> Path and filename. It uses filename to create files with simplified sentences and with an index within the filename.
Requirements: senteces must be separated one per file and they must be cleaned.
It calls simplifier.py
```
## scripts
### Scripts used to tag and preprocess text
### Python scritp for sentences simplification
```Shell
\scripts
```
\ No newline at end of file
simplifier.py
It is called by sentence-simplification-main.sh
```
......