Toggle navigation
Toggle navigation
This project
Loading...
Sign in
Carlos-Francisco Méndez-Cruz
/
automatic-extraction-growth-conditions
Go to a project
Toggle navigation
Toggle navigation pinning
Projects
Groups
Snippets
Help
Project
Activity
Repository
Graphs
Network
Create a new issue
Commits
Issue Boards
Authored by
Estefani Gaytan Nunez
2019-07-12 04:36:30 -0500
Browse Files
Options
Browse Files
Download
Plain Diff
Commit
3e95a34263f0599facf58486b3a69a672494a468
3e95a342
2 parents
5cea8340
e2b4be09
Merge branch 'master' of
http://pakal.ccg.unam.mx/cmendezc/automatic-extraction-growth-conditions
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
19 additions
and
10 deletions
README.md
README.md
View file @
3e95a34
#
"Automatic Extraction of Growth Conditions (GC) from the Gene Expression Omnibus (GEO)".
Project to extract in an automatic way
,
the growth conditions of all enterobacteria within the GEO using "Conditional Random Fields " (CRFs).
#
Automatic Extraction of Growth Conditions (GCs) from the Gene Expression Omnibus (GEO)
Project to extract in an automatic way the growth conditions of all enterobacteria within the GEO using "Conditional Random Fields " (CRFs).
## Prerequisites
### Programming languages
-
Python (version 2.7, version 3)
-
Python (version 2.7, version 3
.7
)
-
Bash
## Folder content
...
...
@@ -12,23 +12,32 @@ Project to extract in an automatic way, the growth conditions of all enterobacte
1.
label-split_training_test_v1.py
2.
params.py
3.
training_validation_v3.py
-
check
1.
sentences-405-order-rep.txt
-
data-sets
1.
test-data-set-30.txt
2.
training-data-set-70.txt
-
models
1.
training-data-set-70.fStopWords_False.fSymbols_False.mod
-
reports
-
reports
_Folder that encloses files with
**information of the performance of the CRF while identifying GCs.**
_
1.
report_training-data-set-70.fStopWords_False.fSymbols_False.txt
2.
y_pred_training-data-set-70.fStopWords_False.fSymbols_False.txt
3.
y_test_training-data-set-70.fStopWords_False.fSymbols_False.txt
**CoreNLP**
-
bin
1.
get-raw-sentences.sh
2.
single_run.sh
1.
get-raw-sentences.sh
_Script that
**extracts the GCs**
from the file: "tagged-xml-data" and adds the phrase: "PGCGROWTHCONDITIONS" to all lines._
2.
single_run.sh
_Script that
**runs**
th script: "corenlp.sh" with the desired parameters._
-
input
1.
raw-metadata-senteneces.txt
1.
raw-metadata-senteneces.txt
_Resulting file from "get-raw-sentences.sh".
**Contains all the GCs.**
_
-
output
1.
raw-metadata-senteneces.txt.conll
\ No newline at end of file
1.
raw-metadata-senteneces.txt.conll
_This file contains
**all the words of all the GCs**
tagged with its
**"LEMMA" & "POS"**
_
**data-sets**
-
report-manually-tagged-gcs
_Contains the extracted GCs of all the samples for each serie._
-
tagged-xml-data
_Contains the
**original xml-tagged files**
where the GCs will be extracted._
\ No newline at end of file
...
...
Please
register
or
login
to post a comment