Ignacio Arroyo Fernández

Update README.md

......@@ -14,6 +14,7 @@ The main method follows the next pipeline:
### Prediction mode
- Parse abstracts from a unique input file
- Transform abstracts into their TFIDF sparse representations
- Transform TFIDF representations into their 200-dimensional SVD approximation
- Predict useless/useful papers by means of their abstracts using pretrained Support Vector Machines
# Usage
......@@ -21,7 +22,7 @@ The main method follows the next pipeline:
For filtering unknown abstracts run
```bash
$ python filter_abstracts.py --input data/test_abstracts.txt
$ python filter_abstracts_binClass.py --input data/test_abstracts.txt
```
The predictions will be stored by default at `filter_output/`, unless a different directory is specified by means of the `--out` option. The default names containing the predicitons are
......@@ -36,10 +37,10 @@ The format of each file is:
<PMID> \t <text of the abstract>
```
For training a new model set the list of parameters at `model_params.conf` and then run
For training a new model set the list of parameters at `model_params_binClass.conf` and then run
```bash
$ python filter_abstracts.py --classA data/ecoli_abstracts/not_useful_abstracts.txt --classB data/ecoli_abstracts/useful_abstracts.txt
$ python filter_abstracts_binClass.py --classA data/ecoli_abstracts/not_useful_abstracts.txt --classB data/ecoli_abstracts/useful_abstracts.txt
```
where `--classA` and `--classA` are used to specify input training files. In this example `data/ecoli_abstracts/useful_abstracts.txt` is the training files containing abstracts of papers reporting experimental data (the desired or useful class for us).
where `--classA` and `--classB` (the useful papers) are used to specify input training files. In this example `data/ecoli_abstracts/useful_abstracts.txt` is the training files containing abstracts of papers reporting experimental data (the desired or useful class for us).
......