Estefani Gaytan Nunez
Showing 1 changed file with 19 additions and 10 deletions
1 -# "Automatic Extraction of Growth Conditions (GC) from the Gene Expression Omnibus (GEO)". 1 +# Automatic Extraction of Growth Conditions (GCs) from the Gene Expression Omnibus (GEO)
2 -Project to extract in an automatic way, the growth conditions of all enterobacteria within the GEO using "Conditional Random Fields " (CRFs). 2 +Project to extract in an automatic way the growth conditions of all enterobacteria within the GEO using "Conditional Random Fields " (CRFs).
3 3
4 ## Prerequisites 4 ## Prerequisites
5 ### Programming languages 5 ### Programming languages
6 - - Python (version 2.7, version 3) 6 + - Python (version 2.7, version 3.7)
7 - Bash 7 - Bash
8 8
9 ## Folder content 9 ## Folder content
...@@ -12,23 +12,32 @@ Project to extract in an automatic way, the growth conditions of all enterobacte ...@@ -12,23 +12,32 @@ Project to extract in an automatic way, the growth conditions of all enterobacte
12 1. label-split_training_test_v1.py 12 1. label-split_training_test_v1.py
13 2. params.py 13 2. params.py
14 3. training_validation_v3.py 14 3. training_validation_v3.py
15 - - check
16 - 1. sentences-405-order-rep.txt
17 - data-sets 15 - data-sets
18 1. test-data-set-30.txt 16 1. test-data-set-30.txt
19 2. training-data-set-70.txt 17 2. training-data-set-70.txt
20 - models 18 - models
21 1. training-data-set-70.fStopWords_False.fSymbols_False.mod 19 1. training-data-set-70.fStopWords_False.fSymbols_False.mod
22 - - reports 20 + - reports
21 + _Folder that encloses files with **information of the performance of the CRF while identifying GCs.**_
23 1. report_training-data-set-70.fStopWords_False.fSymbols_False.txt 22 1. report_training-data-set-70.fStopWords_False.fSymbols_False.txt
24 2. y_pred_training-data-set-70.fStopWords_False.fSymbols_False.txt 23 2. y_pred_training-data-set-70.fStopWords_False.fSymbols_False.txt
25 3. y_test_training-data-set-70.fStopWords_False.fSymbols_False.txt 24 3. y_test_training-data-set-70.fStopWords_False.fSymbols_False.txt
26 25
27 **CoreNLP** 26 **CoreNLP**
28 - bin 27 - bin
29 - 1. get-raw-sentences.sh 28 + 1. get-raw-sentences.sh
30 - 2. single_run.sh 29 + _Script that **extracts the GCs** from the file: "tagged-xml-data" and adds the phrase: "PGCGROWTHCONDITIONS" to all lines._
30 + 2. single_run.sh
31 + _Script that **runs** th script: "corenlp.sh" with the desired parameters._
31 - input 32 - input
32 - 1. raw-metadata-senteneces.txt 33 + 1. raw-metadata-senteneces.txt
34 + _Resulting file from "get-raw-sentences.sh". **Contains all the GCs.**_
33 - output 35 - output
34 - 1. raw-metadata-senteneces.txt.conll
...\ No newline at end of file ...\ No newline at end of file
36 + 1. raw-metadata-senteneces.txt.conll
37 + _This file contains **all the words of all the GCs** tagged with its **"LEMMA" & "POS"**_
38 +
39 +**data-sets**
40 + - report-manually-tagged-gcs
41 + _Contains the extracted GCs of all the samples for each serie._
42 + - tagged-xml-data
43 + _Contains the **original xml-tagged files** where the GCs will be extracted._
...\ No newline at end of file ...\ No newline at end of file
......