TY - JOUR
T1 - Shapley-Additive-Explanations-Based Factor Analysis for Dengue Severity Prediction using Machine Learning
AU - Chowdhury, Shihab Uddin
AU - Sayeed, Sanjana
AU - Rashid, Iktisad
AU - Alam, Md Golam Rabiul
AU - Masum, Abdul Kadar Muhammad
AU - Dewan, M. Ali Akber
N1 - Publisher Copyright:
© 2022 by the authors.
PY - 2022/9
Y1 - 2022/9
N2 - Dengue is a viral disease that primarily affects tropical and subtropical regions and is especially prevalent in South-East Asia. This mosquito-borne disease sometimes triggers nationwide epidemics, which results in a large number of fatalities. The development of Dengue Haemorrhagic Fever (DHF) is where most cases occur, and a large portion of them are detected among children under the age of ten, with severe conditions often progressing to a critical state known as Dengue Shock Syndrome (DSS). In this study, we analysed two separate datasets from two different countries– Vietnam and Bangladesh, which we referred as VDengu and BDengue, respectively. For the VDengu dataset, as it was structured, supervised learning models were effective for predictive analysis, among which, the decision tree classifier XGBoost in particular produced the best outcome. Furthermore, Shapley Additive Explanation (SHAP) was used over the XGBoost model to assess the significance of individual attributes of the dataset. Among the significant attributes, we applied the SHAP dependence plot to identify the range for each attribute against the number of DHF or DSS cases. In parallel, the dataset from Bangladesh was unstructured; therefore, we applied an unsupervised learning technique, i.e., hierarchical clustering, to find clusters of vital blood components of the patients according to their complete blood count reports. The clusters were further analysed to find the attributes in the dataset that led to DSS or DHF.
AB - Dengue is a viral disease that primarily affects tropical and subtropical regions and is especially prevalent in South-East Asia. This mosquito-borne disease sometimes triggers nationwide epidemics, which results in a large number of fatalities. The development of Dengue Haemorrhagic Fever (DHF) is where most cases occur, and a large portion of them are detected among children under the age of ten, with severe conditions often progressing to a critical state known as Dengue Shock Syndrome (DSS). In this study, we analysed two separate datasets from two different countries– Vietnam and Bangladesh, which we referred as VDengu and BDengue, respectively. For the VDengu dataset, as it was structured, supervised learning models were effective for predictive analysis, among which, the decision tree classifier XGBoost in particular produced the best outcome. Furthermore, Shapley Additive Explanation (SHAP) was used over the XGBoost model to assess the significance of individual attributes of the dataset. Among the significant attributes, we applied the SHAP dependence plot to identify the range for each attribute against the number of DHF or DSS cases. In parallel, the dataset from Bangladesh was unstructured; therefore, we applied an unsupervised learning technique, i.e., hierarchical clustering, to find clusters of vital blood components of the patients according to their complete blood count reports. The clusters were further analysed to find the attributes in the dataset that led to DSS or DHF.
KW - Dengue Haemorrhagic Fever
KW - Dengue Shock Syndrome
KW - Shapley Additive Explanation
KW - XGBoosting
KW - clinical data
KW - dengue
KW - hierarchical clustering
KW - supervised
KW - unsupervised
UR - http://www.scopus.com/inward/record.url?scp=85138733464&partnerID=8YFLogxK
U2 - 10.3390/jimaging8090229
DO - 10.3390/jimaging8090229
M3 - Journal Article
AN - SCOPUS:85138733464
VL - 8
JO - Journal of Imaging
JF - Journal of Imaging
IS - 9
M1 - 229
ER -