Showing
9 changed files
with
0 additions
and
1168 deletions
| 1 | -The test set consists of 634 data points, each of which represents | ||
| 2 | -a molecule that is either active (A) or inactive (I). The test set | ||
| 3 | -has the same format as the training set, with the exception that the | ||
| 4 | -activity value (A or I) for each data point is missing, that is, has | ||
| 5 | -been replaced by a question mark (?). Please submit one prediction, | ||
| 6 | -A or I, for each data point. Your submission should be in the form | ||
| 7 | -of a file that starts with your contact information, followed by a | ||
| 8 | -line with 5 asterisks, followed immediately by your predictions, with | ||
| 9 | -one line per data point. The predictions should be in the same order | ||
| 10 | -as the test set data points. So your prediction for the first example | ||
| 11 | -should appear on the first line after the asterisks, your prediction | ||
| 12 | -for the second example should appear on the second line after the | ||
| 13 | -asterisks, etc. Hence, after your contact information, the prediction | ||
| 14 | -file will consist of 635 lines and have the form: | ||
| 15 | - | ||
| 16 | -***** | ||
| 17 | -I | ||
| 18 | -I | ||
| 19 | -A | ||
| 20 | -I | ||
| 21 | -A | ||
| 22 | -I | ||
| 23 | - | ||
| 24 | -etc. | ||
| 25 | - | ||
| 26 | -You may submit your prediction by email to page@biostat.wisc.edu | ||
| 27 | -or by anonymous ftp to ftp.biostat.wisc.edu, placing the file | ||
| 28 | -into the directory dropboxes/page/. If using email, please use | ||
| 29 | -the subject line "KDDcup <name> thrombin" where <name> is your | ||
| 30 | -name. If using ftp, please name the file KDDcup.<name>.thrombin | ||
| 31 | -where <name> is your name. For example, my submission would be | ||
| 32 | -named KDDcup.DavidPage.thrombin | ||
| 33 | - | ||
| 34 | -Only one submission per person per task is permitted. If you do not | ||
| 35 | -receive email confirmation of your submission within 24 hours, please | ||
| 36 | -email page@biostat.wisc.edu with subject "KDDcup no confirmation". | ||
| 37 | - | ||
| 38 | -For group entries, the contact information should include the names | ||
| 39 | -of everyone to be credited as a member of the group should your entry | ||
| 40 | -achieve the highest score. But no person is to be listed on more than | ||
| 41 | -one entry per task. |
| 1 | -Prediction of Molecular Bioactivity for Drug Design -- Binding to Thrombin | ||
| 2 | --------------------------------------------------------------------------- | ||
| 3 | - | ||
| 4 | -Drugs are typically small organic molecules that achieve their desired | ||
| 5 | -activity by binding to a target site on a receptor. The first step in | ||
| 6 | -the discovery of a new drug is usually to identify and isolate the | ||
| 7 | -receptor to which it should bind, followed by testing many small | ||
| 8 | -molecules for their ability to bind to the target site. This leaves | ||
| 9 | -researchers with the task of determining what separates the active | ||
| 10 | -(binding) compounds from the inactive (non-binding) ones. Such a | ||
| 11 | -determination can then be used in the design of new compounds that not | ||
| 12 | -only bind, but also have all the other properties required for a drug | ||
| 13 | -(solubility, oral absorption, lack of side effects, appropriate duration | ||
| 14 | -of action, toxicity, etc.). | ||
| 15 | - | ||
| 16 | -The present training data set consists of 1909 compounds tested for | ||
| 17 | -their ability to bind to a target site on thrombin, a key receptor in | ||
| 18 | -blood clotting. The chemical structures of these compounds are not | ||
| 19 | -necessary for our analysis and are not included. Of these compounds, 42 | ||
| 20 | -are active (bind well) and the others are inactive. Each compound is | ||
| 21 | -described by a single feature vector comprised of a class value (A for | ||
| 22 | -active, I for inactive) and 139,351 binary features, which describe | ||
| 23 | -three-dimensional properties of the molecule. The definitions of the | ||
| 24 | -individual bits are not included - we don't know what each individual | ||
| 25 | -bit means, only that they are generated in an internally consistent | ||
| 26 | -manner for all 1909 compounds. Biological activity in general, and | ||
| 27 | -receptor binding affinity in particular, correlate with various | ||
| 28 | -structural and physical properties of small organic molecules. The task | ||
| 29 | -is to determine which of these properties are critical in this case and | ||
| 30 | -to learn to accurately predict the class value. To simulate the | ||
| 31 | -real-world drug design environment, the test set contains 636 additional | ||
| 32 | -compounds that were in fact generated based on the assay results | ||
| 33 | -recorded for the training set. In evaluating the accuracy, a | ||
| 34 | -differential cost model will be used, so that the sum of the costs of | ||
| 35 | -the actives will be equal to the sum of the costs of the inactives. | ||
| 36 | - | ||
| 37 | -We thank DuPont Pharmaceuticals for graciously providing this data set | ||
| 38 | -for the KDD Cup 2001 competition. All publications referring to | ||
| 39 | -analysis of this data set should acknowledge DuPont Pharmaceuticals | ||
| 40 | -Research Laboratories and KDD Cup 2001. |
binding-thrombin-dataset/README.txt
deleted
100644 → 0
File mode changed
This diff could not be displayed because it is too large.
| 1 | -I | ||
| 2 | -A | ||
| 3 | -I | ||
| 4 | -I | ||
| 5 | -I | ||
| 6 | -A | ||
| 7 | -I | ||
| 8 | -I | ||
| 9 | -I | ||
| 10 | -A | ||
| 11 | -I | ||
| 12 | -I | ||
| 13 | -I | ||
| 14 | -A | ||
| 15 | -I | ||
| 16 | -A | ||
| 17 | -I | ||
| 18 | -I | ||
| 19 | -I | ||
| 20 | -I | ||
| 21 | -I | ||
| 22 | -I | ||
| 23 | -I | ||
| 24 | -I | ||
| 25 | -I | ||
| 26 | -I | ||
| 27 | -I | ||
| 28 | -A | ||
| 29 | -I | ||
| 30 | -A | ||
| 31 | -I | ||
| 32 | -I | ||
| 33 | -I | ||
| 34 | -I | ||
| 35 | -I | ||
| 36 | -A | ||
| 37 | -I | ||
| 38 | -I | ||
| 39 | -I | ||
| 40 | -A | ||
| 41 | -I | ||
| 42 | -I | ||
| 43 | -I | ||
| 44 | -I | ||
| 45 | -I | ||
| 46 | -I | ||
| 47 | -I | ||
| 48 | -I | ||
| 49 | -A | ||
| 50 | -A | ||
| 51 | -I | ||
| 52 | -I | ||
| 53 | -I | ||
| 54 | -I | ||
| 55 | -I | ||
| 56 | -A | ||
| 57 | -I | ||
| 58 | -A | ||
| 59 | -A | ||
| 60 | -I | ||
| 61 | -I | ||
| 62 | -I | ||
| 63 | -A | ||
| 64 | -I | ||
| 65 | -I | ||
| 66 | -I | ||
| 67 | -I | ||
| 68 | -A | ||
| 69 | -A | ||
| 70 | -I | ||
| 71 | -A | ||
| 72 | -I | ||
| 73 | -I | ||
| 74 | -A | ||
| 75 | -A | ||
| 76 | -I | ||
| 77 | -I | ||
| 78 | -I | ||
| 79 | -I | ||
| 80 | -I | ||
| 81 | -I | ||
| 82 | -I | ||
| 83 | -I | ||
| 84 | -I | ||
| 85 | -I | ||
| 86 | -I | ||
| 87 | -I | ||
| 88 | -I | ||
| 89 | -I | ||
| 90 | -I | ||
| 91 | -A | ||
| 92 | -I | ||
| 93 | -I | ||
| 94 | -A | ||
| 95 | -I | ||
| 96 | -A | ||
| 97 | -I | ||
| 98 | -I | ||
| 99 | -I | ||
| 100 | -A | ||
| 101 | -A | ||
| 102 | -I | ||
| 103 | -I | ||
| 104 | -I | ||
| 105 | -I | ||
| 106 | -I | ||
| 107 | -I | ||
| 108 | -A | ||
| 109 | -I | ||
| 110 | -I | ||
| 111 | -I | ||
| 112 | -I | ||
| 113 | -A | ||
| 114 | -A | ||
| 115 | -I | ||
| 116 | -I | ||
| 117 | -I | ||
| 118 | -I | ||
| 119 | -I | ||
| 120 | -I | ||
| 121 | -I | ||
| 122 | -I | ||
| 123 | -A | ||
| 124 | -A | ||
| 125 | -I | ||
| 126 | -A | ||
| 127 | -A | ||
| 128 | -I | ||
| 129 | -I | ||
| 130 | -I | ||
| 131 | -I | ||
| 132 | -I | ||
| 133 | -I | ||
| 134 | -I | ||
| 135 | -I | ||
| 136 | -I | ||
| 137 | -I | ||
| 138 | -A | ||
| 139 | -I | ||
| 140 | -I | ||
| 141 | -I | ||
| 142 | -I | ||
| 143 | -I | ||
| 144 | -I | ||
| 145 | -I | ||
| 146 | -I | ||
| 147 | -I | ||
| 148 | -A | ||
| 149 | -I | ||
| 150 | -I | ||
| 151 | -I | ||
| 152 | -I | ||
| 153 | -A | ||
| 154 | -I | ||
| 155 | -I | ||
| 156 | -I | ||
| 157 | -I | ||
| 158 | -I | ||
| 159 | -I | ||
| 160 | -I | ||
| 161 | -A | ||
| 162 | -I | ||
| 163 | -I | ||
| 164 | -A | ||
| 165 | -I | ||
| 166 | -A | ||
| 167 | -I | ||
| 168 | -I | ||
| 169 | -A | ||
| 170 | -I | ||
| 171 | -A | ||
| 172 | -I | ||
| 173 | -A | ||
| 174 | -I | ||
| 175 | -A | ||
| 176 | -I | ||
| 177 | -I | ||
| 178 | -I | ||
| 179 | -I | ||
| 180 | -I | ||
| 181 | -A | ||
| 182 | -I | ||
| 183 | -I | ||
| 184 | -A | ||
| 185 | -I | ||
| 186 | -I | ||
| 187 | -A | ||
| 188 | -I | ||
| 189 | -I | ||
| 190 | -I | ||
| 191 | -A | ||
| 192 | -I | ||
| 193 | -A | ||
| 194 | -I | ||
| 195 | -I | ||
| 196 | -A | ||
| 197 | -I | ||
| 198 | -I | ||
| 199 | -I | ||
| 200 | -I | ||
| 201 | -A | ||
| 202 | -I | ||
| 203 | -A | ||
| 204 | -I | ||
| 205 | -I | ||
| 206 | -I | ||
| 207 | -I | ||
| 208 | -I | ||
| 209 | -I | ||
| 210 | -I | ||
| 211 | -I | ||
| 212 | -I | ||
| 213 | -I | ||
| 214 | -I | ||
| 215 | -A | ||
| 216 | -I | ||
| 217 | -A | ||
| 218 | -I | ||
| 219 | -I | ||
| 220 | -I | ||
| 221 | -I | ||
| 222 | -I | ||
| 223 | -I | ||
| 224 | -A | ||
| 225 | -I | ||
| 226 | -I | ||
| 227 | -A | ||
| 228 | -A | ||
| 229 | -A | ||
| 230 | -I | ||
| 231 | -I | ||
| 232 | -A | ||
| 233 | -A | ||
| 234 | -I | ||
| 235 | -I | ||
| 236 | -I | ||
| 237 | -I | ||
| 238 | -A | ||
| 239 | -I | ||
| 240 | -I | ||
| 241 | -I | ||
| 242 | -I | ||
| 243 | -A | ||
| 244 | -I | ||
| 245 | -A | ||
| 246 | -I | ||
| 247 | -I | ||
| 248 | -I | ||
| 249 | -I | ||
| 250 | -I | ||
| 251 | -I | ||
| 252 | -I | ||
| 253 | -A | ||
| 254 | -A | ||
| 255 | -I | ||
| 256 | -I | ||
| 257 | -I | ||
| 258 | -I | ||
| 259 | -I | ||
| 260 | -I | ||
| 261 | -I | ||
| 262 | -I | ||
| 263 | -A | ||
| 264 | -A | ||
| 265 | -I | ||
| 266 | -I | ||
| 267 | -I | ||
| 268 | -I | ||
| 269 | -I | ||
| 270 | -I | ||
| 271 | -A | ||
| 272 | -A | ||
| 273 | -I | ||
| 274 | -I | ||
| 275 | -I | ||
| 276 | -I | ||
| 277 | -I | ||
| 278 | -I | ||
| 279 | -A | ||
| 280 | -I | ||
| 281 | -A | ||
| 282 | -I | ||
| 283 | -I | ||
| 284 | -I | ||
| 285 | -I | ||
| 286 | -I | ||
| 287 | -I | ||
| 288 | -I | ||
| 289 | -I | ||
| 290 | -I | ||
| 291 | -A | ||
| 292 | -I | ||
| 293 | -I | ||
| 294 | -A | ||
| 295 | -I | ||
| 296 | -I | ||
| 297 | -I | ||
| 298 | -I | ||
| 299 | -I | ||
| 300 | -I | ||
| 301 | -A | ||
| 302 | -A | ||
| 303 | -I | ||
| 304 | -I | ||
| 305 | -I | ||
| 306 | -I | ||
| 307 | -I | ||
| 308 | -A | ||
| 309 | -I | ||
| 310 | -I | ||
| 311 | -I | ||
| 312 | -I | ||
| 313 | -I | ||
| 314 | -A | ||
| 315 | -A | ||
| 316 | -A | ||
| 317 | -I | ||
| 318 | -A | ||
| 319 | -I | ||
| 320 | -I | ||
| 321 | -I | ||
| 322 | -I | ||
| 323 | -A | ||
| 324 | -A | ||
| 325 | -I | ||
| 326 | -A | ||
| 327 | -A | ||
| 328 | -I | ||
| 329 | -I | ||
| 330 | -I | ||
| 331 | -I | ||
| 332 | -I | ||
| 333 | -I | ||
| 334 | -I | ||
| 335 | -I | ||
| 336 | -I | ||
| 337 | -I | ||
| 338 | -I | ||
| 339 | -I | ||
| 340 | -A | ||
| 341 | -I | ||
| 342 | -I | ||
| 343 | -I | ||
| 344 | -I | ||
| 345 | -A | ||
| 346 | -A | ||
| 347 | -I | ||
| 348 | -I | ||
| 349 | -A | ||
| 350 | -I | ||
| 351 | -I | ||
| 352 | -I | ||
| 353 | -I | ||
| 354 | -I | ||
| 355 | -A | ||
| 356 | -A | ||
| 357 | -I | ||
| 358 | -A | ||
| 359 | -I | ||
| 360 | -I | ||
| 361 | -I | ||
| 362 | -I | ||
| 363 | -I | ||
| 364 | -I | ||
| 365 | -A | ||
| 366 | -A | ||
| 367 | -I | ||
| 368 | -I | ||
| 369 | -A | ||
| 370 | -I | ||
| 371 | -I | ||
| 372 | -I | ||
| 373 | -I | ||
| 374 | -I | ||
| 375 | -I | ||
| 376 | -I | ||
| 377 | -I | ||
| 378 | -I | ||
| 379 | -I | ||
| 380 | -I | ||
| 381 | -I | ||
| 382 | -A | ||
| 383 | -I | ||
| 384 | -I | ||
| 385 | -A | ||
| 386 | -I | ||
| 387 | -I | ||
| 388 | -A | ||
| 389 | -I | ||
| 390 | -I | ||
| 391 | -I | ||
| 392 | -I | ||
| 393 | -A | ||
| 394 | -A | ||
| 395 | -I | ||
| 396 | -A | ||
| 397 | -A | ||
| 398 | -I | ||
| 399 | -I | ||
| 400 | -A | ||
| 401 | -I | ||
| 402 | -I | ||
| 403 | -I | ||
| 404 | -I | ||
| 405 | -A | ||
| 406 | -I | ||
| 407 | -I | ||
| 408 | -I | ||
| 409 | -I | ||
| 410 | -I | ||
| 411 | -I | ||
| 412 | -I | ||
| 413 | -I | ||
| 414 | -I | ||
| 415 | -I | ||
| 416 | -A | ||
| 417 | -I | ||
| 418 | -I | ||
| 419 | -A | ||
| 420 | -A | ||
| 421 | -I | ||
| 422 | -I | ||
| 423 | -I | ||
| 424 | -A | ||
| 425 | -I | ||
| 426 | -I | ||
| 427 | -I | ||
| 428 | -I | ||
| 429 | -A | ||
| 430 | -I | ||
| 431 | -A | ||
| 432 | -I | ||
| 433 | -I | ||
| 434 | -I | ||
| 435 | -I | ||
| 436 | -I | ||
| 437 | -A | ||
| 438 | -I | ||
| 439 | -I | ||
| 440 | -I | ||
| 441 | -I | ||
| 442 | -I | ||
| 443 | -I | ||
| 444 | -I | ||
| 445 | -I | ||
| 446 | -I | ||
| 447 | -I | ||
| 448 | -I | ||
| 449 | -I | ||
| 450 | -I | ||
| 451 | -I | ||
| 452 | -I | ||
| 453 | -I | ||
| 454 | -I | ||
| 455 | -I | ||
| 456 | -A | ||
| 457 | -A | ||
| 458 | -A | ||
| 459 | -A | ||
| 460 | -I | ||
| 461 | -I | ||
| 462 | -I | ||
| 463 | -A | ||
| 464 | -A | ||
| 465 | -I | ||
| 466 | -I | ||
| 467 | -I | ||
| 468 | -I | ||
| 469 | -I | ||
| 470 | -A | ||
| 471 | -I | ||
| 472 | -A | ||
| 473 | -I | ||
| 474 | -I | ||
| 475 | -I | ||
| 476 | -I | ||
| 477 | -I | ||
| 478 | -A | ||
| 479 | -I | ||
| 480 | -I | ||
| 481 | -I | ||
| 482 | -I | ||
| 483 | -A | ||
| 484 | -A | ||
| 485 | -I | ||
| 486 | -I | ||
| 487 | -I | ||
| 488 | -I | ||
| 489 | -I | ||
| 490 | -I | ||
| 491 | -I | ||
| 492 | -I | ||
| 493 | -I | ||
| 494 | -I | ||
| 495 | -I | ||
| 496 | -I | ||
| 497 | -I | ||
| 498 | -I | ||
| 499 | -I | ||
| 500 | -I | ||
| 501 | -I | ||
| 502 | -A | ||
| 503 | -I | ||
| 504 | -A | ||
| 505 | -I | ||
| 506 | -I | ||
| 507 | -A | ||
| 508 | -I | ||
| 509 | -I | ||
| 510 | -I | ||
| 511 | -I | ||
| 512 | -A | ||
| 513 | -I | ||
| 514 | -I | ||
| 515 | -A | ||
| 516 | -A | ||
| 517 | -I | ||
| 518 | -I | ||
| 519 | -I | ||
| 520 | -A | ||
| 521 | -I | ||
| 522 | -A | ||
| 523 | -I | ||
| 524 | -I | ||
| 525 | -I | ||
| 526 | -I | ||
| 527 | -I | ||
| 528 | -I | ||
| 529 | -I | ||
| 530 | -A | ||
| 531 | -A | ||
| 532 | -I | ||
| 533 | -I | ||
| 534 | -I | ||
| 535 | -A | ||
| 536 | -I | ||
| 537 | -I | ||
| 538 | -I | ||
| 539 | -A | ||
| 540 | -I | ||
| 541 | -I | ||
| 542 | -I | ||
| 543 | -I | ||
| 544 | -I | ||
| 545 | -I | ||
| 546 | -A | ||
| 547 | -I | ||
| 548 | -I | ||
| 549 | -I | ||
| 550 | -I | ||
| 551 | -I | ||
| 552 | -A | ||
| 553 | -I | ||
| 554 | -I | ||
| 555 | -I | ||
| 556 | -I | ||
| 557 | -I | ||
| 558 | -A | ||
| 559 | -I | ||
| 560 | -I | ||
| 561 | -A | ||
| 562 | -I | ||
| 563 | -I | ||
| 564 | -I | ||
| 565 | -I | ||
| 566 | -I | ||
| 567 | -I | ||
| 568 | -A | ||
| 569 | -I | ||
| 570 | -I | ||
| 571 | -I | ||
| 572 | -I | ||
| 573 | -I | ||
| 574 | -I | ||
| 575 | -I | ||
| 576 | -I | ||
| 577 | -I | ||
| 578 | -I | ||
| 579 | -I | ||
| 580 | -I | ||
| 581 | -I | ||
| 582 | -I | ||
| 583 | -I | ||
| 584 | -I | ||
| 585 | -A | ||
| 586 | -I | ||
| 587 | -I | ||
| 588 | -A | ||
| 589 | -I | ||
| 590 | -I | ||
| 591 | -I | ||
| 592 | -I | ||
| 593 | -A | ||
| 594 | -I | ||
| 595 | -I | ||
| 596 | -I | ||
| 597 | -I | ||
| 598 | -I | ||
| 599 | -A | ||
| 600 | -I | ||
| 601 | -I | ||
| 602 | -I | ||
| 603 | -A | ||
| 604 | -I | ||
| 605 | -I | ||
| 606 | -I | ||
| 607 | -A | ||
| 608 | -I | ||
| 609 | -A | ||
| 610 | -A | ||
| 611 | -I | ||
| 612 | -A | ||
| 613 | -I | ||
| 614 | -I | ||
| 615 | -I | ||
| 616 | -I | ||
| 617 | -I | ||
| 618 | -I | ||
| 619 | -I | ||
| 620 | -A | ||
| 621 | -I | ||
| 622 | -I | ||
| 623 | -I | ||
| 624 | -A | ||
| 625 | -I | ||
| 626 | -A | ||
| 627 | -I | ||
| 628 | -I | ||
| 629 | -I | ||
| 630 | -I | ||
| 631 | -A | ||
| 632 | -I | ||
| 633 | -A | ||
| 634 | -I |
This diff could not be displayed because it is too large.
This diff could not be displayed because it is too large.
| 1 | -# -*- encoding: utf-8 -*- | ||
| 2 | - | ||
| 3 | -import os | ||
| 4 | -from time import time | ||
| 5 | -import argparse | ||
| 6 | -from sklearn.naive_bayes import BernoulliNB | ||
| 7 | -from sklearn.svm import SVC | ||
| 8 | -from sklearn.neighbors import KNeighborsClassifier | ||
| 9 | -from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, \ | ||
| 10 | - classification_report | ||
| 11 | -from sklearn.externals import joblib | ||
| 12 | -from sklearn import model_selection | ||
| 13 | -from sklearn.feature_selection import SelectKBest, chi2 | ||
| 14 | -from sklearn.decomposition import TruncatedSVD | ||
| 15 | -from scipy.sparse import csr_matrix | ||
| 16 | -import scipy | ||
| 17 | - | ||
| 18 | -__author__ = 'CMendezC' | ||
| 19 | - | ||
| 20 | -# Goal: training, crossvalidation and testing binding thrombin data set | ||
| 21 | - | ||
| 22 | -# Parameters: | ||
| 23 | -# 1) --inputPath Path to read input files. | ||
| 24 | -# 2) --inputTrainingData File to read training data. | ||
| 25 | -# 3) --inputTestingData File to read testing data. | ||
| 26 | -# 4) --inputTestingClasses File to read testing classes. | ||
| 27 | -# 5) --outputModelPath Path to place output model. | ||
| 28 | -# 6) --outputModelFile File to place output model. | ||
| 29 | -# 7) --outputReportPath Path to place evaluation report. | ||
| 30 | -# 8) --outputReportFile File to place evaluation report. | ||
| 31 | -# 9) --classifier Classifier: BernoulliNB, SVM, kNN. | ||
| 32 | -# 10) --saveData Save matrices | ||
| 33 | -# 11) --kernel Kernel | ||
| 34 | -# 12) --reduction Feature selection or dimensionality reduction | ||
| 35 | - | ||
| 36 | -# Ouput: | ||
| 37 | -# 1) Classification model and evaluation report. | ||
| 38 | - | ||
| 39 | -# Execution: | ||
| 40 | - | ||
| 41 | -# python training-crossvalidation-testing-binding-thrombin.py | ||
| 42 | -# --inputPath /home/compu2/bionlp/lcg-bioinfoI-bionlp/clasificacion-automatica/binding-thrombin-dataset | ||
| 43 | -# --inputTrainingData thrombin.data | ||
| 44 | -# --inputTestingData Thrombin.testset | ||
| 45 | -# --inputTestingClasses Thrombin.testset.class | ||
| 46 | -# --outputModelPath /home/compu2/bionlp/lcg-bioinfoI-bionlp/clasificacion-automatica/binding-thrombin-dataset/models | ||
| 47 | -# --outputModelFile SVM-model.mod | ||
| 48 | -# --outputReportPath /home/compu2/bionlp/lcg-bioinfoI-bionlp/clasificacion-automatica/binding-thrombin-dataset/reports | ||
| 49 | -# --outputReportFile SVM.txt | ||
| 50 | -# --classifier SVM | ||
| 51 | -# --saveData | ||
| 52 | -# --kernel linear | ||
| 53 | -# --reduction SVD200 | ||
| 54 | - | ||
| 55 | -# source activate python3 | ||
| 56 | -# python training-crossvalidation-testing-binding-thrombin.py --inputPath /home/compu2/bionlp/lcg-bioinfoI-bionlp/clasificacion-automatica/binding-thrombin-dataset --inputTrainingData thrombin.data --inputTestingData Thrombin.testset --inputTestingClasses Thrombin.testset.class --outputModelPath /home/compu2/bionlp/lcg-bioinfoI-bionlp/clasificacion-automatica/binding-thrombin-dataset/models --outputModelFile SVM-linear-model.mod --outputReportPath /home/compu2/bionlp/lcg-bioinfoI-bionlp/clasificacion-automatica/binding-thrombin-dataset/reports --outputReportFile SVM-linear.txt --classifier SVM --kernel rbf | ||
| 57 | - | ||
| 58 | -########################################################### | ||
| 59 | -# MAIN PROGRAM # | ||
| 60 | -########################################################### | ||
| 61 | - | ||
| 62 | -if __name__ == "__main__": | ||
| 63 | - # Parameter definition | ||
| 64 | - parser = argparse.ArgumentParser(description='Training validation Binding Thrombin Dataset.') | ||
| 65 | - parser.add_argument("--inputPath", dest="inputPath", | ||
| 66 | - help="Path to read input files", metavar="PATH") | ||
| 67 | - parser.add_argument("--inputTrainingData", dest="inputTrainingData", | ||
| 68 | - help="File to read training data", metavar="FILE") | ||
| 69 | - parser.add_argument("--inputTestingData", dest="inputTestingData", | ||
| 70 | - help="File to read testing data", metavar="FILE") | ||
| 71 | - parser.add_argument("--inputTestingClasses", dest="inputTestingClasses", | ||
| 72 | - help="File to read testing classes", metavar="FILE") | ||
| 73 | - parser.add_argument("--outputModelPath", dest="outputModelPath", | ||
| 74 | - help="Path to place output model", metavar="PATH") | ||
| 75 | - parser.add_argument("--outputModelFile", dest="outputModelFile", | ||
| 76 | - help="File to place output model", metavar="FILE") | ||
| 77 | - parser.add_argument("--outputReportPath", dest="outputReportPath", | ||
| 78 | - help="Path to place evaluation report", metavar="PATH") | ||
| 79 | - parser.add_argument("--outputReportFile", dest="outputReportFile", | ||
| 80 | - help="File to place evaluation report", metavar="FILE") | ||
| 81 | - parser.add_argument("--classifier", dest="classifier", | ||
| 82 | - help="Classifier", metavar="NAME", | ||
| 83 | - choices=('BernoulliNB', 'SVM', 'kNN'), default='SVM') | ||
| 84 | - parser.add_argument("--saveData", dest="saveData", action='store_true', | ||
| 85 | - help="Save matrices") | ||
| 86 | - parser.add_argument("--kernel", dest="kernel", | ||
| 87 | - help="Kernel SVM", metavar="NAME", | ||
| 88 | - choices=('linear', 'rbf', 'poly'), default='linear') | ||
| 89 | - parser.add_argument("--reduction", dest="reduction", | ||
| 90 | - help="Feature selection or dimensionality reduction", metavar="NAME", | ||
| 91 | - choices=('SVD200', 'SVD300', 'CHI250', 'CHI2100'), default=None) | ||
| 92 | - | ||
| 93 | - args = parser.parse_args() | ||
| 94 | - | ||
| 95 | - # Printing parameter values | ||
| 96 | - print('-------------------------------- PARAMETERS --------------------------------') | ||
| 97 | - print("Path to read input files: " + str(args.inputPath)) | ||
| 98 | - print("File to read training data: " + str(args.inputTrainingData)) | ||
| 99 | - print("File to read testing data: " + str(args.inputTestingData)) | ||
| 100 | - print("File to read testing classes: " + str(args.inputTestingClasses)) | ||
| 101 | - print("Path to place output model: " + str(args.outputModelPath)) | ||
| 102 | - print("File to place output model: " + str(args.outputModelFile)) | ||
| 103 | - print("Path to place evaluation report: " + str(args.outputReportPath)) | ||
| 104 | - print("File to place evaluation report: " + str(args.outputReportFile)) | ||
| 105 | - print("Classifier: " + str(args.classifier)) | ||
| 106 | - print("Save matrices: " + str(args.saveData)) | ||
| 107 | - print("Kernel: " + str(args.kernel)) | ||
| 108 | - print("Reduction: " + str(args.reduction)) | ||
| 109 | - | ||
| 110 | - # Start time | ||
| 111 | - t0 = time() | ||
| 112 | - | ||
| 113 | - print("Reading training data and true classes...") | ||
| 114 | - X_train = None | ||
| 115 | - if args.saveData: | ||
| 116 | - y_train = [] | ||
| 117 | - trainingData = [] | ||
| 118 | - with open(os.path.join(args.inputPath, args.inputTrainingData), encoding='utf8', mode='r') \ | ||
| 119 | - as iFile: | ||
| 120 | - for line in iFile: | ||
| 121 | - line = line.strip('\r\n') | ||
| 122 | - listLine = line.split(',') | ||
| 123 | - y_train.append(listLine[0]) | ||
| 124 | - trainingData.append(listLine[1:]) | ||
| 125 | - # X_train = np.matrix(trainingData) | ||
| 126 | - X_train = csr_matrix(trainingData, dtype='double') | ||
| 127 | - print(" Saving matrix and classes...") | ||
| 128 | - joblib.dump(X_train, os.path.join(args.outputModelPath, args.inputTrainingData + '.jlb')) | ||
| 129 | - joblib.dump(y_train, os.path.join(args.outputModelPath, args.inputTrainingData + '.class.jlb')) | ||
| 130 | - print(" Done!") | ||
| 131 | - else: | ||
| 132 | - print(" Loading matrix and classes...") | ||
| 133 | - X_train = joblib.load(os.path.join(args.outputModelPath, args.inputTrainingData + '.jlb')) | ||
| 134 | - y_train = joblib.load(os.path.join(args.outputModelPath, args.inputTrainingData + '.class.jlb')) | ||
| 135 | - print(" Done!") | ||
| 136 | - | ||
| 137 | - print(" Number of training classes: {}".format(len(y_train))) | ||
| 138 | - print(" Number of training class A: {}".format(y_train.count('A'))) | ||
| 139 | - print(" Number of training class I: {}".format(y_train.count('I'))) | ||
| 140 | - print(" Shape of training matrix: {}".format(X_train.shape)) | ||
| 141 | - | ||
| 142 | - print("Reading testing data and true classes...") | ||
| 143 | - X_test = None | ||
| 144 | - if args.saveData: | ||
| 145 | - y_test = [] | ||
| 146 | - testingData = [] | ||
| 147 | - with open(os.path.join(args.inputPath, args.inputTestingData), encoding='utf8', mode='r') \ | ||
| 148 | - as iFile: | ||
| 149 | - for line in iFile: | ||
| 150 | - line = line.strip('\r\n') | ||
| 151 | - listLine = line.split(',') | ||
| 152 | - testingData.append(listLine[1:]) | ||
| 153 | - X_test = csr_matrix(testingData, dtype='double') | ||
| 154 | - with open(os.path.join(args.inputPath, args.inputTestingClasses), encoding='utf8', mode='r') \ | ||
| 155 | - as iFile: | ||
| 156 | - for line in iFile: | ||
| 157 | - line = line.strip('\r\n') | ||
| 158 | - y_test.append(line) | ||
| 159 | - print(" Saving matrix and classes...") | ||
| 160 | - joblib.dump(X_test, os.path.join(args.outputModelPath, args.inputTestingData + '.jlb')) | ||
| 161 | - joblib.dump(y_test, os.path.join(args.outputModelPath, args.inputTestingClasses + '.class.jlb')) | ||
| 162 | - print(" Done!") | ||
| 163 | - else: | ||
| 164 | - print(" Loading matrix and classes...") | ||
| 165 | - X_test = joblib.load(os.path.join(args.outputModelPath, args.inputTestingData + '.jlb')) | ||
| 166 | - y_test = joblib.load(os.path.join(args.outputModelPath, args.inputTestingClasses + '.class.jlb')) | ||
| 167 | - print(" Done!") | ||
| 168 | - | ||
| 169 | - print(" Number of testing classes: {}".format(len(y_test))) | ||
| 170 | - print(" Number of testing class A: {}".format(y_test.count('A'))) | ||
| 171 | - print(" Number of testing class I: {}".format(y_test.count('I'))) | ||
| 172 | - print(" Shape of testing matrix: {}".format(X_test.shape)) | ||
| 173 | - | ||
| 174 | - # Feature selection and dimensional reduction | ||
| 175 | - if args.reduction is not None: | ||
| 176 | - print('Performing dimensionality reduction or feature selection...', args.reduction) | ||
| 177 | - if args.reduction == 'SVD200': | ||
| 178 | - reduc = TruncatedSVD(n_components=200, random_state=42) | ||
| 179 | - X_train = reduc.fit_transform(X_train) | ||
| 180 | - if args.reduction == 'SVD300': | ||
| 181 | - reduc = TruncatedSVD(n_components=300, random_state=42) | ||
| 182 | - X_train = reduc.fit_transform(X_train) | ||
| 183 | - elif args.reduction == 'CHI250': | ||
| 184 | - reduc = SelectKBest(chi2, k=50) | ||
| 185 | - X_train = reduc.fit_transform(X_train, y_train) | ||
| 186 | - elif args.reduction == 'CHI2100': | ||
| 187 | - reduc = SelectKBest(chi2, k=100) | ||
| 188 | - X_train = reduc.fit_transform(X_train, y_train) | ||
| 189 | - print(" Done!") | ||
| 190 | - print(' New shape of training matrix: ', X_train.shape) | ||
| 191 | - | ||
| 192 | - jobs = -1 | ||
| 193 | - paramGrid = [] | ||
| 194 | - nIter = 20 | ||
| 195 | - crossV = 10 | ||
| 196 | - print("Defining randomized grid search...") | ||
| 197 | - if args.classifier == 'SVM': | ||
| 198 | - # SVM | ||
| 199 | - classifier = SVC() | ||
| 200 | - if args.kernel == 'rbf': | ||
| 201 | - paramGrid = {'C': scipy.stats.expon(scale=100), | ||
| 202 | - 'gamma': scipy.stats.expon(scale=.1), | ||
| 203 | - 'kernel': ['rbf'], 'class_weight': ['balanced', None]} | ||
| 204 | - elif args.kernel == 'linear': | ||
| 205 | - paramGrid = {'C': scipy.stats.expon(scale=100), | ||
| 206 | - 'kernel': ['linear'], | ||
| 207 | - 'class_weight': ['balanced', None]} | ||
| 208 | - elif args.kernel == 'poly': | ||
| 209 | - paramGrid = {'C': scipy.stats.expon(scale=100), | ||
| 210 | - 'gamma': scipy.stats.expon(scale=.1), 'degree': [2, 3], | ||
| 211 | - 'kernel': ['poly'], 'class_weight': ['balanced', None]} | ||
| 212 | - myClassifier = model_selection.RandomizedSearchCV(classifier, | ||
| 213 | - paramGrid, n_iter=nIter, | ||
| 214 | - cv=crossV, n_jobs=jobs, verbose=3) | ||
| 215 | - elif args.classifier == 'BernoulliNB': | ||
| 216 | - # BernoulliNB | ||
| 217 | - classifier = BernoulliNB() | ||
| 218 | - paramGrid = {'alpha': scipy.stats.expon(scale=1.0)} | ||
| 219 | - myClassifier = model_selection.RandomizedSearchCV(classifier, paramGrid, n_iter=nIter, | ||
| 220 | - cv=crossV, n_jobs=jobs, verbose=3) | ||
| 221 | - # elif args.classifier == 'kNN': | ||
| 222 | - # # kNN | ||
| 223 | - # k_range = list(range(1, 7, 2)) | ||
| 224 | - # classifier = KNeighborsClassifier() | ||
| 225 | - # paramGrid = {'n_neighbors ': k_range} | ||
| 226 | - # myClassifier = model_selection.RandomizedSearchCV(classifier, paramGrid, n_iter=3, | ||
| 227 | - # cv=crossV, n_jobs=jobs, verbose=3) | ||
| 228 | - else: | ||
| 229 | - print("Bad classifier") | ||
| 230 | - exit() | ||
| 231 | - print(" Done!") | ||
| 232 | - | ||
| 233 | - print("Training...") | ||
| 234 | - myClassifier.fit(X_train, y_train) | ||
| 235 | - print(" Done!") | ||
| 236 | - | ||
| 237 | - print("Testing (prediction in new data)...") | ||
| 238 | - if args.reduction is not None: | ||
| 239 | - X_test = reduc.transform(X_test) | ||
| 240 | - y_pred = myClassifier.predict(X_test) | ||
| 241 | - best_parameters = myClassifier.best_estimator_.get_params() | ||
| 242 | - print(" Done!") | ||
| 243 | - | ||
| 244 | - print("Saving report...") | ||
| 245 | - with open(os.path.join(args.outputReportPath, args.outputReportFile), mode='w', encoding='utf8') as oFile: | ||
| 246 | - oFile.write('********** EVALUATION REPORT **********\n') | ||
| 247 | - oFile.write('Reduction: {}\n'.format(args.reduction)) | ||
| 248 | - oFile.write('Classifier: {}\n'.format(args.classifier)) | ||
| 249 | - oFile.write('Kernel: {}\n'.format(args.kernel)) | ||
| 250 | - oFile.write('Accuracy: {}\n'.format(accuracy_score(y_test, y_pred))) | ||
| 251 | - oFile.write('Precision: {}\n'.format(precision_score(y_test, y_pred, average='weighted'))) | ||
| 252 | - oFile.write('Recall: {}\n'.format(recall_score(y_test, y_pred, average='weighted'))) | ||
| 253 | - oFile.write('F-score: {}\n'.format(f1_score(y_test, y_pred, average='weighted'))) | ||
| 254 | - oFile.write('Confusion matrix: \n') | ||
| 255 | - oFile.write(str(confusion_matrix(y_test, y_pred)) + '\n') | ||
| 256 | - oFile.write('Classification report: \n') | ||
| 257 | - oFile.write(classification_report(y_test, y_pred) + '\n') | ||
| 258 | - oFile.write('Best parameters: \n') | ||
| 259 | - for param in sorted(best_parameters.keys()): | ||
| 260 | - oFile.write("\t%s: %r\n" % (param, best_parameters[param])) | ||
| 261 | - print(" Done!") | ||
| 262 | - | ||
| 263 | - print("Training and testing done in: %fs" % (time() - t0)) |
| 1 | -# -*- encoding: utf-8 -*- | ||
| 2 | - | ||
| 3 | -import os | ||
| 4 | -from time import time | ||
| 5 | -import argparse | ||
| 6 | -from sklearn.naive_bayes import BernoulliNB | ||
| 7 | -from sklearn.svm import SVC | ||
| 8 | -from sklearn.neighbors import KNeighborsClassifier | ||
| 9 | -from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, \ | ||
| 10 | - classification_report | ||
| 11 | -from sklearn.externals import joblib | ||
| 12 | -from scipy.sparse import csr_matrix | ||
| 13 | - | ||
| 14 | -__author__ = 'CMendezC' | ||
| 15 | - | ||
| 16 | -# Goal: training and testing binding thrombin data set | ||
| 17 | - | ||
| 18 | -# Parameters: | ||
| 19 | -# 1) --inputPath Path to read input files. | ||
| 20 | -# 2) --inputTrainingData File to read training data. | ||
| 21 | -# 3) --inputTestingData File to read testing data. | ||
| 22 | -# 4) --inputTestingClasses File to read testing classes. | ||
| 23 | -# 5) --outputModelPath Path to place output model. | ||
| 24 | -# 6) --outputModelFile File to place output model. | ||
| 25 | -# 7) --outputReportPath Path to place evaluation report. | ||
| 26 | -# 8) --outputReportFile File to place evaluation report. | ||
| 27 | -# 9) --classifier Classifier: BernoulliNB, SVM, kNN. | ||
| 28 | -# 10) --saveData Save matrices | ||
| 29 | - | ||
| 30 | -# Ouput: | ||
| 31 | -# 1) Classification model and evaluation report. | ||
| 32 | - | ||
| 33 | -# Execution: | ||
| 34 | - | ||
| 35 | -# python training-testing-binding-thrombin.py | ||
| 36 | -# --inputPath /home/binding-thrombin-dataset | ||
| 37 | -# --inputTrainingData thrombin.data | ||
| 38 | -# --inputTestingData Thrombin.testset | ||
| 39 | -# --inputTestingClasses Thrombin.testset.class | ||
| 40 | -# --outputModelPath /home/binding-thrombin-dataset/models | ||
| 41 | -# --outputModelFile SVM-model.mod | ||
| 42 | -# --outputReportPath /home/binding-thrombin-dataset/reports | ||
| 43 | -# --outputReportFile SVM.txt | ||
| 44 | -# --classifier SVM | ||
| 45 | -# --saveData | ||
| 46 | - | ||
| 47 | -# source activate python3 | ||
| 48 | -# python training-testing-binding-thrombin.py --inputPath /home/compu2/bionlp/lcg-bioinfoI-bionlp/clasificacion-automatica/binding-thrombin-dataset --inputTrainingData thrombin.data --inputTestingData Thrombin.testset --inputTestingClasses Thrombin.testset.class --outputModelPath /home/compu2/bionlp/lcg-bioinfoI-bionlp/clasificacion-automatica/binding-thrombin-dataset/models --outputModelFile SVM-model.mod --outputReportPath /home/compu2/bionlp/lcg-bioinfoI-bionlp/clasificacion-automatica/binding-thrombin-dataset/reports --outputReportFile SVM.txt --classifier SVM --saveData | ||
| 49 | - | ||
| 50 | -########################################################### | ||
| 51 | -# MAIN PROGRAM # | ||
| 52 | -########################################################### | ||
| 53 | - | ||
| 54 | -if __name__ == "__main__": | ||
| 55 | - # Parameter definition | ||
| 56 | - parser = argparse.ArgumentParser(description='Training and testing Binding Thrombin Dataset.') | ||
| 57 | - parser.add_argument("--inputPath", dest="inputPath", | ||
| 58 | - help="Path to read input files", metavar="PATH") | ||
| 59 | - parser.add_argument("--inputTrainingData", dest="inputTrainingData", | ||
| 60 | - help="File to read training data", metavar="FILE") | ||
| 61 | - parser.add_argument("--inputTestingData", dest="inputTestingData", | ||
| 62 | - help="File to read testing data", metavar="FILE") | ||
| 63 | - parser.add_argument("--inputTestingClasses", dest="inputTestingClasses", | ||
| 64 | - help="File to read testing classes", metavar="FILE") | ||
| 65 | - parser.add_argument("--outputModelPath", dest="outputModelPath", | ||
| 66 | - help="Path to place output model", metavar="PATH") | ||
| 67 | - parser.add_argument("--outputModelFile", dest="outputModelFile", | ||
| 68 | - help="File to place output model", metavar="FILE") | ||
| 69 | - parser.add_argument("--outputReportPath", dest="outputReportPath", | ||
| 70 | - help="Path to place evaluation report", metavar="PATH") | ||
| 71 | - parser.add_argument("--outputReportFile", dest="outputReportFile", | ||
| 72 | - help="File to place evaluation report", metavar="FILE") | ||
| 73 | - parser.add_argument("--classifier", dest="classifier", | ||
| 74 | - help="Classifier", metavar="NAME", | ||
| 75 | - choices=('BernoulliNB', 'SVM', 'kNN'), default='SVM') | ||
| 76 | - parser.add_argument("--saveData", dest="saveData", action='store_true', | ||
| 77 | - help="Save matrices") | ||
| 78 | - | ||
| 79 | - args = parser.parse_args() | ||
| 80 | - | ||
| 81 | - # Printing parameter values | ||
| 82 | - print('-------------------------------- PARAMETERS --------------------------------') | ||
| 83 | - print("Path to read input files: " + str(args.inputPath)) | ||
| 84 | - print("File to read training data: " + str(args.inputTrainingData)) | ||
| 85 | - print("File to read testing data: " + str(args.inputTestingData)) | ||
| 86 | - print("File to read testing classes: " + str(args.inputTestingClasses)) | ||
| 87 | - print("Path to place output model: " + str(args.outputModelPath)) | ||
| 88 | - print("File to place output model: " + str(args.outputModelFile)) | ||
| 89 | - print("Path to place evaluation report: " + str(args.outputReportPath)) | ||
| 90 | - print("File to place evaluation report: " + str(args.outputReportFile)) | ||
| 91 | - print("Classifier: " + str(args.classifier)) | ||
| 92 | - print("Save matrices: " + str(args.saveData)) | ||
| 93 | - | ||
| 94 | - # Start time | ||
| 95 | - t0 = time() | ||
| 96 | - | ||
| 97 | - print("Reading training data and true classes...") | ||
| 98 | - X_train = None | ||
| 99 | - if args.saveData: | ||
| 100 | - y_train = [] | ||
| 101 | - trainingData = [] | ||
| 102 | - with open(os.path.join(args.inputPath, args.inputTrainingData), encoding='utf8', mode='r') \ | ||
| 103 | - as iFile: | ||
| 104 | - for line in iFile: | ||
| 105 | - line = line.strip('\r\n') | ||
| 106 | - listLine = line.split(',') | ||
| 107 | - y_train.append(listLine[0]) | ||
| 108 | - trainingData.append(listLine[1:]) | ||
| 109 | - # X_train = np.matrix(trainingData) | ||
| 110 | - X_train = csr_matrix(trainingData, dtype='double') | ||
| 111 | - print(" Saving matrix and classes...") | ||
| 112 | - joblib.dump(X_train, os.path.join(args.outputModelPath, args.inputTrainingData + '.jlb')) | ||
| 113 | - joblib.dump(y_train, os.path.join(args.outputModelPath, args.inputTrainingData + '.class.jlb')) | ||
| 114 | - print(" Done!") | ||
| 115 | - else: | ||
| 116 | - print(" Loading matrix and classes...") | ||
| 117 | - X_train = joblib.load(os.path.join(args.outputModelPath, args.inputTrainingData + '.jlb')) | ||
| 118 | - y_train = joblib.load(os.path.join(args.outputModelPath, args.inputTrainingData + '.class.jlb')) | ||
| 119 | - print(" Done!") | ||
| 120 | - | ||
| 121 | - print(" Number of training classes: {}".format(len(y_train))) | ||
| 122 | - print(" Number of training class A: {}".format(y_train.count('A'))) | ||
| 123 | - print(" Number of training class I: {}".format(y_train.count('I'))) | ||
| 124 | - print(" Shape of training matrix: {}".format(X_train.shape)) | ||
| 125 | - | ||
| 126 | - print("Reading testing data and true classes...") | ||
| 127 | - X_test = None | ||
| 128 | - if args.saveData: | ||
| 129 | - y_test = [] | ||
| 130 | - testingData = [] | ||
| 131 | - with open(os.path.join(args.inputPath, args.inputTestingData), encoding='utf8', mode='r') \ | ||
| 132 | - as iFile: | ||
| 133 | - for line in iFile: | ||
| 134 | - line = line.strip('\r\n') | ||
| 135 | - listLine = line.split(',') | ||
| 136 | - testingData.append(listLine[1:]) | ||
| 137 | - X_test = csr_matrix(testingData, dtype='double') | ||
| 138 | - with open(os.path.join(args.inputPath, args.inputTestingClasses), encoding='utf8', mode='r') \ | ||
| 139 | - as iFile: | ||
| 140 | - for line in iFile: | ||
| 141 | - line = line.strip('\r\n') | ||
| 142 | - y_test.append(line) | ||
| 143 | - print(" Saving matrix and classes...") | ||
| 144 | - joblib.dump(X_test, os.path.join(args.outputModelPath, args.inputTestingData + '.jlb')) | ||
| 145 | - joblib.dump(y_test, os.path.join(args.outputModelPath, args.inputTestingClasses + '.class.jlb')) | ||
| 146 | - print(" Done!") | ||
| 147 | - else: | ||
| 148 | - print(" Loading matrix and classes...") | ||
| 149 | - X_test = joblib.load(os.path.join(args.outputModelPath, args.inputTestingData + '.jlb')) | ||
| 150 | - y_test = joblib.load(os.path.join(args.outputModelPath, args.inputTestingClasses + '.class.jlb')) | ||
| 151 | - print(" Done!") | ||
| 152 | - | ||
| 153 | - print(" Number of testing classes: {}".format(len(y_test))) | ||
| 154 | - print(" Number of testing class A: {}".format(y_test.count('A'))) | ||
| 155 | - print(" Number of testing class I: {}".format(y_test.count('I'))) | ||
| 156 | - print(" Shape of testing matrix: {}".format(X_test.shape)) | ||
| 157 | - | ||
| 158 | - if args.classifier == "BernoulliNB": | ||
| 159 | - classifier = BernoulliNB() | ||
| 160 | - elif args.classifier == "SVM": | ||
| 161 | - classifier = SVC() | ||
| 162 | - elif args.classifier == "kNN": | ||
| 163 | - classifier = KNeighborsClassifier() | ||
| 164 | - else: | ||
| 165 | - print("Bad classifier") | ||
| 166 | - exit() | ||
| 167 | - | ||
| 168 | - print("Training...") | ||
| 169 | - classifier.fit(X_train, y_train) | ||
| 170 | - print(" Done!") | ||
| 171 | - | ||
| 172 | - print("Testing (prediction in new data)...") | ||
| 173 | - y_pred = classifier.predict(X_test) | ||
| 174 | - print(" Done!") | ||
| 175 | - | ||
| 176 | - print("Saving report...") | ||
| 177 | - with open(os.path.join(args.outputReportPath, args.outputReportFile), mode='w', encoding='utf8') as oFile: | ||
| 178 | - oFile.write('********** EVALUATION REPORT **********\n') | ||
| 179 | - oFile.write('Classifier: {}\n'.format(args.classifier)) | ||
| 180 | - oFile.write('Accuracy: {}\n'.format(accuracy_score(y_test, y_pred))) | ||
| 181 | - oFile.write('Precision: {}\n'.format(precision_score(y_test, y_pred, average='weighted'))) | ||
| 182 | - oFile.write('Recall: {}\n'.format(recall_score(y_test, y_pred, average='weighted'))) | ||
| 183 | - oFile.write('F-score: {}\n'.format(f1_score(y_test, y_pred, average='weighted'))) | ||
| 184 | - oFile.write('Confusion matrix: \n') | ||
| 185 | - oFile.write(str(confusion_matrix(y_test, y_pred)) + '\n') | ||
| 186 | - oFile.write('Classification report: \n') | ||
| 187 | - oFile.write(classification_report(y_test, y_pred) + '\n') | ||
| 188 | - print(" Done!") | ||
| 189 | - | ||
| 190 | - print("Training and testing done in: %fs" % (time() - t0)) |
-
Please register or login to post a comment