Maternal Health Risk Classification
Background
Maternal health risk refers to the likelihood that a mother or child experiences harmful outcomes during the pregnancy, labor, or postpartum period (Togunwa et al., 2023).
Unfortunately, maternal health risk remains a global concern. According to the World Health Organization, nearly 800 women died every day on average from preventable issues during pregnancy and childbirth in 2020 (WHO, 2024).
Machine learning models can recognize complex patterns within medical data that could be overlooked by humans or traditional statistical analyses (Sathasivam and Abdullahi, 2024).
While previous studies have explored machine learning techniques for maternal health risk classification, there are still limitations in model variation and feature importance analysis.
Introduction
Given gaps in research, this study aimed to improve these issues by examining multiple ML methods to assess maternal health risk using a dataset with more maternal health features to identify the most crucial ones.
In this study, I aimed to improve maternal health risk classification by addressing two research goals:
- First, I used three machine learning models – an ordered probit logistic regression, a decision tree, and a random forest – to identify the most important factors for classifying maternal health risk.
- Second, I evaluated and compared overall model accuracy as well as within- group performance using confusion matrices.
By understanding key risk factors and improving accuracy, this study contributes to the effort to use machine learning to improve maternal healthcare outcomes.
Methods
Dataset
- All data came from the Maternal Health Risk dataset from the University of California, Irvine (UCI) Machine Learning Repository (Ahmed, 2020).
- Data was collected from hospitals, community clinics, and other healthcare facilities in Bangladesh.
- It contained six features, including age, systolic blood pressure, diastolic blood pressure, blood glucose levels, body temperature and heart rate.
- The target variable, risk level during pregnancy, had three classes – low, mid, and high risk.
Predictors of Maternal Health Risk
- Fit 3 optimized models – ordered probit logistic regression, decision tree, and random forest.
- Found most important predictors for classifying risk in each model.
Model Comparison
- Compared the accuracy of each of the 3 models.
- accuracy = sum of correct classifications / total classifications
- Analyzed within-group classification accuracies by creating confusion matrices.