Showing
1 changed file
with
13 additions
and
3 deletions
| 1 | -# "Automatic Extraction of Growth Conditions (GC) from the Gene Expression Omnibus (GEO)". | 1 | +# "Automatic Extraction of Growth Conditions (GCs) from the Gene Expression Omnibus (GEO)". | 
| 2 | -Project to extract in an automatic way, the growth conditions of all enterobacteria within the GEO using "Conditional Random Fields " (CRFs). | 2 | +Project to extract in an automatic way the growth conditions of all enterobacteria within the GEO using "Conditional Random Fields " (CRFs). | 
| 3 | 3 | ||
| 4 | ## Prerequisites | 4 | ## Prerequisites | 
| 5 | ### Programming languages | 5 | ### Programming languages | 
| 6 | - - Python (version 2.7, version 3) | 6 | + - Python (version 2.7, version 3.7) | 
| 7 | - Bash | 7 | - Bash | 
| 8 | 8 | ||
| 9 | ## Folder content | 9 | ## Folder content | 
| ... | @@ -27,8 +27,18 @@ Project to extract in an automatic way, the growth conditions of all enterobacte | ... | @@ -27,8 +27,18 @@ Project to extract in an automatic way, the growth conditions of all enterobacte | 
| 27 | **CoreNLP** | 27 | **CoreNLP** | 
| 28 | - bin | 28 | - bin | 
| 29 | 1. get-raw-sentences.sh | 29 | 1. get-raw-sentences.sh | 
| 30 | + _Script that **extracts the GCs** from the file: "tagged-xml-data" and adds the phrase: "PGCGROWTHCONDITIONS" to all lines._ | ||
| 30 | 2. single_run.sh | 31 | 2. single_run.sh | 
| 32 | + _Script that **runs** th script: "corenlp.sh" with the desired parameters._ | ||
| 31 | - input | 33 | - input | 
| 32 | 1. raw-metadata-senteneces.txt | 34 | 1. raw-metadata-senteneces.txt | 
| 35 | + _Resulting file from "get-raw-sentences.sh". **Contains all the GCs.**_ | ||
| 33 | - output | 36 | - output | 
| 34 | 1. raw-metadata-senteneces.txt.conll | 37 | 1. raw-metadata-senteneces.txt.conll | 
| 38 | + _This file contains **all the words of all the GCs** tagged with its **"lemma" & "POS"**_ | ||
| 39 | + | ||
| 40 | +**data-sets** | ||
| 41 | + - report-manually-tagged-gcs | ||
| 42 | + _Contains the extracted GCs of all the samples for each serie._ | ||
| 43 | + - tagged-xml-data | ||
| 44 | + _Contains the **original xml-tagged files** where the GCs will be extracted._ | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file | ... | ... | 
- 
Please register or login to post a comment