Merge branch 'master' of http://pakal.ccg.unam.mx/cmendezc/automatic-extraction-growth-conditions
Showing
1 changed file
with
19 additions
and
10 deletions
| 1 | -# "Automatic Extraction of Growth Conditions (GC) from the Gene Expression Omnibus (GEO)". | 1 | +# Automatic Extraction of Growth Conditions (GCs) from the Gene Expression Omnibus (GEO) |
| 2 | -Project to extract in an automatic way, the growth conditions of all enterobacteria within the GEO using "Conditional Random Fields " (CRFs). | 2 | +Project to extract in an automatic way the growth conditions of all enterobacteria within the GEO using "Conditional Random Fields " (CRFs). |
| 3 | 3 | ||
| 4 | ## Prerequisites | 4 | ## Prerequisites |
| 5 | ### Programming languages | 5 | ### Programming languages |
| 6 | - - Python (version 2.7, version 3) | 6 | + - Python (version 2.7, version 3.7) |
| 7 | - Bash | 7 | - Bash |
| 8 | 8 | ||
| 9 | ## Folder content | 9 | ## Folder content |
| ... | @@ -12,23 +12,32 @@ Project to extract in an automatic way, the growth conditions of all enterobacte | ... | @@ -12,23 +12,32 @@ Project to extract in an automatic way, the growth conditions of all enterobacte |
| 12 | 1. label-split_training_test_v1.py | 12 | 1. label-split_training_test_v1.py |
| 13 | 2. params.py | 13 | 2. params.py |
| 14 | 3. training_validation_v3.py | 14 | 3. training_validation_v3.py |
| 15 | - - check | ||
| 16 | - 1. sentences-405-order-rep.txt | ||
| 17 | - data-sets | 15 | - data-sets |
| 18 | 1. test-data-set-30.txt | 16 | 1. test-data-set-30.txt |
| 19 | 2. training-data-set-70.txt | 17 | 2. training-data-set-70.txt |
| 20 | - models | 18 | - models |
| 21 | 1. training-data-set-70.fStopWords_False.fSymbols_False.mod | 19 | 1. training-data-set-70.fStopWords_False.fSymbols_False.mod |
| 22 | - - reports | 20 | + - reports |
| 21 | + _Folder that encloses files with **information of the performance of the CRF while identifying GCs.**_ | ||
| 23 | 1. report_training-data-set-70.fStopWords_False.fSymbols_False.txt | 22 | 1. report_training-data-set-70.fStopWords_False.fSymbols_False.txt |
| 24 | 2. y_pred_training-data-set-70.fStopWords_False.fSymbols_False.txt | 23 | 2. y_pred_training-data-set-70.fStopWords_False.fSymbols_False.txt |
| 25 | 3. y_test_training-data-set-70.fStopWords_False.fSymbols_False.txt | 24 | 3. y_test_training-data-set-70.fStopWords_False.fSymbols_False.txt |
| 26 | 25 | ||
| 27 | **CoreNLP** | 26 | **CoreNLP** |
| 28 | - bin | 27 | - bin |
| 29 | - 1. get-raw-sentences.sh | 28 | + 1. get-raw-sentences.sh |
| 30 | - 2. single_run.sh | 29 | + _Script that **extracts the GCs** from the file: "tagged-xml-data" and adds the phrase: "PGCGROWTHCONDITIONS" to all lines._ |
| 30 | + 2. single_run.sh | ||
| 31 | + _Script that **runs** th script: "corenlp.sh" with the desired parameters._ | ||
| 31 | - input | 32 | - input |
| 32 | - 1. raw-metadata-senteneces.txt | 33 | + 1. raw-metadata-senteneces.txt |
| 34 | + _Resulting file from "get-raw-sentences.sh". **Contains all the GCs.**_ | ||
| 33 | - output | 35 | - output |
| 34 | - 1. raw-metadata-senteneces.txt.conll | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
| 36 | + 1. raw-metadata-senteneces.txt.conll | ||
| 37 | + _This file contains **all the words of all the GCs** tagged with its **"LEMMA" & "POS"**_ | ||
| 38 | + | ||
| 39 | +**data-sets** | ||
| 40 | + - report-manually-tagged-gcs | ||
| 41 | + _Contains the extracted GCs of all the samples for each serie._ | ||
| 42 | + - tagged-xml-data | ||
| 43 | + _Contains the **original xml-tagged files** where the GCs will be extracted._ | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file | ... | ... |
-
Please register or login to post a comment