Breast cancer prediction using machine learning algorithm

Date

2017-06-30

Authors

Yu, Mengjie

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Breast cancer, mostly occurring in women, is the mostly frequently diagnosed cancer. Early detection based on phenotype and genotype features can greatly increases the chances for successful treatment. In this report, four different machine learning algorithms were tested for breast cancer prediction. Principal component analysis was used to reduce dimension for the original correlated dataset. The results show that KNN, SVM with linear kernel and Logistic Regression outperform Naive Bayes with very similar accuracy. KNN achieved the highest average accuracy of 0.9756 after 10 fold cross-validation when k equals to 7. The highest AUC value of 0.9944 was achieved by SVM with linear kernel. The results also show that increasing number of top eigenvectors increases the prediction accuracy, however, as the eigenvector number goes above a certain threshold, it adds more noise instead of signal.

Department

Description

LCSH Subject Headings

Citation