sub_srr_IV_v2.out
3.49 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
/usr/local/lib/python3.6/dist-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
mapping2MCO_v2.py:266: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
str_matches_odf["SOURCE"] = mco_ifile
-------------------------------- PARAMETERS --------------------------------
--inputPath Path of npl tagged file (crf output): /home/egaytan/automatic-extraction-growth-conditions/mapping_MCO/input/
--iAnnotatedFile Input file of npl tagged file (crf output: sub_srr_IV.tsv
--iOntoFile Input file with the ontology entities (MCO-terms): gc_ontology_terms_v2.txt
--iLinksFile Input file with links and id for the ontology (MCO-type-links): None
--iSynFile Input file for the additional ontology of synonyms (MCO-syn-json): mco_terms_v0.2.json
--outputPath Output path to place output files: /home/egaytan/automatic-extraction-growth-conditions/mapping_MCO/output/
--outputFile Output of the mapping process: sub_srr_IV_7prob_80per_v2.tsv
--minPerMatch Minimal string matching percentage: 80
--minCRFProbs Minimal crf probabilities allowed: 0.9
-------------------------------- INPUTS --------------------------------
npl tagged file
GSE ... REPOFILE
0 GSE100373 ... http://pakal.ccg.unam.mx/cmendezc/automatic-ex...
2 GSE100373 ... http://pakal.ccg.unam.mx/cmendezc/automatic-ex...
6 GSE100373 ... http://pakal.ccg.unam.mx/cmendezc/automatic-ex...
[3 rows x 10 columns]
ontology entities
TERM_ID TERM_NAME
0 MCO000000014 generically dependent continuant
1 MCO000000015 radiation
2 MCO000000016 electromagnetic radiation
additional ontology of synonyms (MCO-syn-json)
ENTITY_NAME TERM_ID TERM_NAME
MCO000000019 continuant MCO000000019
MCO000002475 culture medium MCO000002475
MCO000002467_0 Organism MCO000002467 biologicentity
-------------------------------- RESULTS --------------------------------
Mapping 5706 terms to MCO based on exact strings...
Mapping 5350 terms to MCO - synonyms based on exact strings...
BANGLINE ... TERM_TYPE
16 characteristics_ch1 ... Genetic background
26 characteristics_ch1 ... Genetic background
34 characteristics_ch1 ... Genetic background
[3 rows x 13 columns]
Total of terms mapped by exact strings: 860
Saving filtered terms from raw mapping...
10644 unmapped terms based on exact strings
Dropping duplicated unmapped term names...
242 unmapped unique terms based on exact strings
Mapping to MCO 242 terms based on string similarity...
Mapping to MCO - synonyms 242 terms based on string siilarity..
Unique terms mapped by string similarity: 16
Total of terms mapped by string similarity: 2118
Saving filtered terms from str mapping...