srr_htregulondb_mapping_report.out
3.45 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
/usr/local/lib/python3.6/dist-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
/home/egaytan/automatic-extraction-growth-conditions/mapping_MCO/bin/mapping2MCO_v5.py:312: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
str_matches_odf["SOURCE"] = mco_ifile
-------------------------------- PARAMETERS --------------------------------
--inputPath Path of npl tagged file: /home/egaytan/automatic-extraction-growth-conditions/mapping_MCO/input/
--iAnnotatedFile Input file of npl tagged file: srr_htregulondb_model_Run3_v10_S1_False_S2_True_S3_False_S4_False_Run3_v10.tsv
--iOntoFile Input file with the ontology entities (MCO-terms): gc_ontology_terms_v2.txt
--iLinksFile Input file with links and id for the ontology (MCO-type-links): None
--iSynFile Input file for the additional ontology of synonyms (MCO-syn-json): mco_terms_v0.2.json
--outputPath Output path to place output files: /home/egaytan/automatic-extraction-growth-conditions/mapping_MCO/output/v2/
--outputFile Output of the mapping process: srr_htregulondb.tsv
--minPerMatch Minimal string matching percentage: 80
--minCRFProbs Minimal crf probabilities allowed: 0.9
-------------------------------- INPUTS --------------------------------
npl tagged file
SRR ... REPO_FILE
0 SRR5742248 ... http://pakal.ccg.unam.mx/cmendezc/automatic-ex...
5 SRR5742250 ... http://pakal.ccg.unam.mx/cmendezc/automatic-ex...
7 SRR5742250 ... http://pakal.ccg.unam.mx/cmendezc/automatic-ex...
[3 rows x 15 columns]
ontology entities
TERM_ID TERM_NAME
0 MCO000000014 generically dependent continuant
1 MCO000000015 radiation
2 MCO000000016 electromagnetic radiation
additional ontology of synonyms (MCO-syn-json)
ENTITY_NAME TERM_ID TERM_NAME
MCO000000019 continuant MCO000000019
MCO000002475 culture medium MCO000002475
MCO000002467_0 Organism MCO000002467 biologicentity
-------------------------------- RESULTS --------------------------------
Tracking exact terms to MCO...
Mapping 4099 terms to MCO based on exact strings...
Mapping 3770 terms to MCO - synonyms based on exact strings...
Total of terms mapped by exact strings: 387
Saving filtered terms from raw mapping...
3712 unmapped terms based on exact strings
Dropping duplicated unmapped term names...
206 unmapped unique terms based on exact strings
compute string similarty...
Mapping to MCO 206 terms based on string similarity...
Mapping to MCO - synonyms 152 terms based on string siilarity..
Unique terms mapped by string similarity: 73
Total of terms mapped by string similarity: 1992
Saving filtered terms from str mapping...
--------------------END----------------------
Total of terms mapped: 2379
Total of terms unmapped: 1720