Kevin Meza Landeros

Update README.md

Showing 1 changed file with 17 additions and 7 deletions
# "Automatic Extraction of Growth Conditions (GC) from the Gene Expression Omnibus (GEO)".
Project to extract in an automatic way, the growth conditions of all enterobacteria within the GEO using "Conditional Random Fields " (CRFs).
# "Automatic Extraction of Growth Conditions (GCs) from the Gene Expression Omnibus (GEO)".
Project to extract in an automatic way the growth conditions of all enterobacteria within the GEO using "Conditional Random Fields " (CRFs).
## Prerequisites
### Programming languages
- Python (version 2.7, version 3)
- Python (version 2.7, version 3.7)
- Bash
## Folder content
......@@ -26,9 +26,19 @@ Project to extract in an automatic way, the growth conditions of all enterobacte
**CoreNLP**
- bin
1. get-raw-sentences.sh
2. single_run.sh
1. get-raw-sentences.sh
_Script that **extracts the GCs** from the file: "tagged-xml-data" and adds the phrase: "PGCGROWTHCONDITIONS" to all lines._
2. single_run.sh
_Script that **runs** th script: "corenlp.sh" with the desired parameters._
- input
1. raw-metadata-senteneces.txt
1. raw-metadata-senteneces.txt
_Resulting file from "get-raw-sentences.sh". **Contains all the GCs.**_
- output
1. raw-metadata-senteneces.txt.conll
\ No newline at end of file
1. raw-metadata-senteneces.txt.conll
_This file contains **all the words of all the GCs** tagged with its **"lemma" & "POS"**_
**data-sets**
- report-manually-tagged-gcs
_Contains the extracted GCs of all the samples for each serie._
- tagged-xml-data
_Contains the **original xml-tagged files** where the GCs will be extracted._
\ No newline at end of file
......