Ignacio Arroyo Fernández

Update README.md

...@@ -14,6 +14,7 @@ The main method follows the next pipeline: ...@@ -14,6 +14,7 @@ The main method follows the next pipeline:
14 ### Prediction mode 14 ### Prediction mode
15 - Parse abstracts from a unique input file 15 - Parse abstracts from a unique input file
16 - Transform abstracts into their TFIDF sparse representations 16 - Transform abstracts into their TFIDF sparse representations
17 +- Transform TFIDF representations into their 200-dimensional SVD approximation
17 - Predict useless/useful papers by means of their abstracts using pretrained Support Vector Machines 18 - Predict useless/useful papers by means of their abstracts using pretrained Support Vector Machines
18 19
19 # Usage 20 # Usage
...@@ -21,7 +22,7 @@ The main method follows the next pipeline: ...@@ -21,7 +22,7 @@ The main method follows the next pipeline:
21 For filtering unknown abstracts run 22 For filtering unknown abstracts run
22 23
23 ```bash 24 ```bash
24 -$ python filter_abstracts.py --input data/test_abstracts.txt 25 +$ python filter_abstracts_binClass.py --input data/test_abstracts.txt
25 ``` 26 ```
26 The predictions will be stored by default at `filter_output/`, unless a different directory is specified by means of the `--out` option. The default names containing the predicitons are 27 The predictions will be stored by default at `filter_output/`, unless a different directory is specified by means of the `--out` option. The default names containing the predicitons are
27 28
...@@ -36,10 +37,10 @@ The format of each file is: ...@@ -36,10 +37,10 @@ The format of each file is:
36 <PMID> \t <text of the abstract> 37 <PMID> \t <text of the abstract>
37 ``` 38 ```
38 39
39 -For training a new model set the list of parameters at `model_params.conf` and then run 40 +For training a new model set the list of parameters at `model_params_binClass.conf` and then run
40 41
41 ```bash 42 ```bash
42 -$ python filter_abstracts.py --classA data/ecoli_abstracts/not_useful_abstracts.txt --classB data/ecoli_abstracts/useful_abstracts.txt 43 +$ python filter_abstracts_binClass.py --classA data/ecoli_abstracts/not_useful_abstracts.txt --classB data/ecoli_abstracts/useful_abstracts.txt
43 ``` 44 ```
44 45
45 -where `--classA` and `--classA` are used to specify input training files. In this example `data/ecoli_abstracts/useful_abstracts.txt` is the training files containing abstracts of papers reporting experimental data (the desired or useful class for us). 46 +where `--classA` and `--classB` (the useful papers) are used to specify input training files. In this example `data/ecoli_abstracts/useful_abstracts.txt` is the training files containing abstracts of papers reporting experimental data (the desired or useful class for us).
......