"Automatic Extraction of Growth Conditions (GC) from the Gene Expression Omnibus (GEO)".
Project to extract in an automatic way, the growth conditions of all enterobacteria within the GEO using "Conditional Random Fields " (CRFs).
Prerequisites
Programming languages
- Python (version 2.7, version 3)
- Bash
Folder content
CRF
- bin
- label-split_training_test_v1.py
- params.py
- training_validation_v3.py
 
- check
- sentences-405-order-rep.txt
 
- data-sets
- test-data-set-30.txt
- training-data-set-70.txt
 
- models
- training-data-set-70.fStopWords_False.fSymbols_False.mod
 
- reports
- report_training-data-set-70.fStopWords_False.fSymbols_False.txt
- y_pred_training-data-set-70.fStopWords_False.fSymbols_False.txt
- y_test_training-data-set-70.fStopWords_False.fSymbols_False.txt
 
CoreNLP
- bin
- get-raw-sentences.sh
- single_run.sh
 
- input
- raw-metadata-senteneces.txt
 
- output
- raw-metadata-senteneces.txt.conll