Name Last Update
.idea Loading commit data...
GROWTH-CONDITIONS-GEO-EXTRACTION Loading commit data...
data-set-analysis Loading commit data...
data-sets Loading commit data...
report-manually-tagged-gcs Loading commit data...
README.md Loading commit data...

"Automatic Extraction of Growth Conditions (GC) from the Gene Expression Omnibus (GEO)".

Project to extract in an automatic way, the growth conditions of all enterobacteria within the GEO using "Conditional Random Fields " (CRFs).

Prerequisites

Programming languages

  • Python (version 2.7, version 3)
  • Bash

Folder content

CRF

  • bin
    1. label-split_training_test_v1.py
    2. params.py
    3. training_validation_v3.py
  • check
    1. sentences-405-order-rep.txt
  • data-sets
    1. test-data-set-30.txt
    2. training-data-set-70.txt
  • models
    1. training-data-set-70.fStopWords_False.fSymbols_False.mod
  • reports
    1. report_training-data-set-70.fStopWords_False.fSymbols_False.txt
    2. y_pred_training-data-set-70.fStopWords_False.fSymbols_False.txt
    3. y_test_training-data-set-70.fStopWords_False.fSymbols_False.txt

CoreNLP

  • bin
    1. get-raw-sentences.sh
    2. single_run.sh
  • input
    1. raw-metadata-senteneces.txt
  • output
    1. raw-metadata-senteneces.txt.conll