Data Cleaning & Feature Engineering
To clean our data, we decided to remove the columns in the dataset that would not add value to the application’s decision making, such as the Residence Type and ID columns. We also tried to remove any outliers to ensure our machine learning model would be accurate. For example, in the column for gender type, there was only one row where the gender was defined as ‘Other’ so we made a decision to remove the row. Our features are gender, age, hypertension (high blood pressure), heart disease trend, marital status, work type, average glucose level, and BMI (body mass index). The target variable is if the person has had stroke or not.