An improved technique for risk prediction of Polycystic Ovary Syndrome (PCOS) using feature selection and machine learning

Polycystic Ovary Syndrome (PCOS) is an endocrine disorder that affects more than five million women globally in their childbearing age. The study suggests that accurate and specific machine learning models in conjunction with relevant feature selection methods can play an essential role in detecting PCOS. The statistical feature selection algorithms such as Chi-Square, ANOVA, and Mutual Information identify insignificant features from the data. The present research revealed that the Random Forest classifier achieved 93.52% accuracy on a feature set suggested by the ANOVA test. The results indicated no significant decline in accuracy, and other parameters like F1-Score and specificity improved with a substantial reduction in computational time. The model’s effectiveness is measured by the AUC that varies between 0.82 to 0.98; the higher the value, the better the model’s classification ability. The paper reports improved model performance by suggesting methods to increase AUC, improve recall and specificity. The improved performance of the proposed machine learning model shall help optimize and scale data-driven diagnosis of PCOS at a higher rate and enable better decision-making.