Showing
1 changed file
with
5 additions
and
4 deletions
... | @@ -14,6 +14,7 @@ The main method follows the next pipeline: | ... | @@ -14,6 +14,7 @@ The main method follows the next pipeline: |
14 | ### Prediction mode | 14 | ### Prediction mode |
15 | - Parse abstracts from a unique input file | 15 | - Parse abstracts from a unique input file |
16 | - Transform abstracts into their TFIDF sparse representations | 16 | - Transform abstracts into their TFIDF sparse representations |
17 | +- Transform TFIDF representations into their 200-dimensional SVD approximation | ||
17 | - Predict useless/useful papers by means of their abstracts using pretrained Support Vector Machines | 18 | - Predict useless/useful papers by means of their abstracts using pretrained Support Vector Machines |
18 | 19 | ||
19 | # Usage | 20 | # Usage |
... | @@ -21,7 +22,7 @@ The main method follows the next pipeline: | ... | @@ -21,7 +22,7 @@ The main method follows the next pipeline: |
21 | For filtering unknown abstracts run | 22 | For filtering unknown abstracts run |
22 | 23 | ||
23 | ```bash | 24 | ```bash |
24 | -$ python filter_abstracts.py --input data/test_abstracts.txt | 25 | +$ python filter_abstracts_binClass.py --input data/test_abstracts.txt |
25 | ``` | 26 | ``` |
26 | The predictions will be stored by default at `filter_output/`, unless a different directory is specified by means of the `--out` option. The default names containing the predicitons are | 27 | The predictions will be stored by default at `filter_output/`, unless a different directory is specified by means of the `--out` option. The default names containing the predicitons are |
27 | 28 | ||
... | @@ -36,10 +37,10 @@ The format of each file is: | ... | @@ -36,10 +37,10 @@ The format of each file is: |
36 | <PMID> \t <text of the abstract> | 37 | <PMID> \t <text of the abstract> |
37 | ``` | 38 | ``` |
38 | 39 | ||
39 | -For training a new model set the list of parameters at `model_params.conf` and then run | 40 | +For training a new model set the list of parameters at `model_params_binClass.conf` and then run |
40 | 41 | ||
41 | ```bash | 42 | ```bash |
42 | -$ python filter_abstracts.py --classA data/ecoli_abstracts/not_useful_abstracts.txt --classB data/ecoli_abstracts/useful_abstracts.txt | 43 | +$ python filter_abstracts_binClass.py --classA data/ecoli_abstracts/not_useful_abstracts.txt --classB data/ecoli_abstracts/useful_abstracts.txt |
43 | ``` | 44 | ``` |
44 | 45 | ||
45 | -where `--classA` and `--classA` are used to specify input training files. In this example `data/ecoli_abstracts/useful_abstracts.txt` is the training files containing abstracts of papers reporting experimental data (the desired or useful class for us). | 46 | +where `--classA` and `--classB` (the useful papers) are used to specify input training files. In this example `data/ecoli_abstracts/useful_abstracts.txt` is the training files containing abstracts of papers reporting experimental data (the desired or useful class for us). | ... | ... |
-
Please register or login to post a comment