|
This paper proposes a semi-supervised machine learning method for osteoporosis risk assessment. Existing osteoporosis risk assessment models have problems of low accuracy, and cannot utilize large amounts of unlabeled data. In order to improve the accuracy of diagnosis, the method comprehensively considers the osteoporosis-related questionnaire data and bone image data, and fuses the multi-modal features extracted from them. Feature engineering and Word2vec are used to extract numerical and text features from questionnaires, respectively. CNN is used to extract image features from BMD images. Considering the difficulty of obtaining labeled medical data, this paper builds a self-training semi-supervised model based on XGBoost to classify and evaluate osteoporosis, which uses both labeled and unlabeled data for obtaining better generalization capabilities. Besides, in view of the fact that the questionnaire data has plenty of outliers and missing data, this paper removes outliers based on a DBSCAN algorithm and propose an improved PKNN algorithm to impute the missing data. Experimental results show that the proposed improved semi-supervised method achieves an accuracy of 0.78 in osteoporosis risk assessment and has obvious advantages compared with other methods. |
|
Keywords:Machine Learning;Osteoporosis; Semi-supervised; Feature fusion |
|