Heart Disease Demo: How much risk for a single patient has

The previous post explained, overall, what factors affect health of the heart. This is the global interpretability of machine learning. A better tool for a subject matter expert to try out is on the boundary conditions. Here, he/she can put values which even confuses the experts; and see how well the model behaves. This is called local interpretability. This post is an example of local interpretability and how the model behaves well and erratically, given the input conditions.

The method used here is the use of SHAPELY values. To get an idea how this works, think of a game where each team member contributes to the final score.

A note on the parameters used in the demo

  • Age: Age completed in years
  • Resting blood pressure : Level of blood pressure at resting mode in mm/HG (Systoloc)
  • Cholesterol: Serum cholesterol in mg/dl
  • Maximum Heart Rate Achieved: Heart rate achieved while doing a treadmill test or exercise
  • ST_Depression/oldpeak: Exercise induced ST-depression in comparison with the state of rest
  • Sex: Gender of patient (The data had only male and female)
  • Chest Pain Type: Type of chest pain experienced by patient
  • Fasting blood sugar: Blood sugar levels on fasting > 120 mg/dl represents as 1 in case of true and 0 as false
  • Resting ecg: Result of electrocardiogram while at rest
  • Exercise angina: Angina induced by exercise 0 depicting NO 1 depicting Yes
  • ST slope: ST segment measured in terms of slope during peak exercise

    Try it out

    (If you are loading this for first time, click on show widgets below, to load the application. Best viewed in bigger screen)

#HIDDEN
import pandas
import shap
import joblib
import ipywidgets as widgets
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.patches as mpatches
from sklearn.base import TransformerMixin, BaseEstimator
from pandas import Categorical, get_dummies
import time
from ipywidgets import interact, interact_manual,interactive
from ipywidgets import Layout, Button, Box, VBox
from ipywidgets import Button, HBox, VBox
from ipywidgets import Layout, Button, Box, FloatText, Textarea, Dropdown, Label, IntSlider
import warnings
warnings.filterwarnings("ignore")
#HIDDEN
class CategoricalPreprocessing(BaseEstimator, TransformerMixin):
    def __get_categorical_variables(self,data_frame,threshold=0.70, top_n_values=10):
        likely_categorical = []
        for column in data_frame.columns:
            if 1. * data_frame[column].value_counts(normalize=True).head(top_n_values).sum() > threshold:
                likely_categorical.append(column)
        try:
            likely_categorical.remove('st_depression')
        except:
            pass 
        return likely_categorical
    
    def fit(self, X, y=None):
        self.attribute_names = self.__get_categorical_variables(X)
        cats = {}
        for column in self.attribute_names:
            cats[column] = X[column].unique().tolist()
        self.categoricals = cats
        return self

    def transform(self, X, y=None):
        df = X.copy()
        for column in self.attribute_names:
            df[column] = Categorical(df[column], categories=self.categoricals[column])
        new_df = get_dummies(df, drop_first=False)
        # in case we need them later
        return new_df
#HIDDEN
feature_model=joblib.load('random_forest_heart_model_v2')
categorical_transform=joblib.load('categorical_transform_v2')
explainer_random_forest=joblib.load('shap_random_forest_explainer_v2')
numerical_options=joblib.load('numerical_options_dictionary_v2')
categorical_options=joblib.load('categorical_options_dictionary_v2')


ui_elements=[]
style = {'description_width': 'initial'}
for i in numerical_options.keys():
    minimum=numerical_options[i]['minimum']
    maximum=numerical_options[i]['maximum']
    default=numerical_options[i]['default']
    if i!='st_depression':
        ui_elements.append(widgets.IntSlider(
        value=default,
        min=minimum,
        max=maximum,
        step=1,
        description=i,style=style)
                      )
    else:
        ui_elements.append(widgets.FloatSlider(
        value=default,
        min=minimum,
        max=maximum,
        step=.5,
        description=i,style=style)
                      )
for i in categorical_options.keys():
    ui_elements.append(widgets.Dropdown(
    options=categorical_options[i]['options'],
    value=categorical_options[i]['default'],
    description=i,style=style
    ))
interact_calc=interact.options(manual=True, manual_name="Calculate Risk")

#HIDDEN
def get_risk_string(prediction_probability):
    y_val = prediction_probability* 100
    text_val = str(np.round(y_val, 1)) + "% | "

    # assign a risk group
    if y_val / 100 <= 0.275685:
        risk_grp = ' low risk '
    elif y_val / 100 <= 0.795583:
        risk_grp = ' medium risk '
    else:
        risk_grp = ' high risk '
    
    return text_val+ risk_grp
def get_current_prediction():
    current_values=dict()
    for element in ui_elements:
        current_values[element.description]=element.value
    feature_row=categorical_transform.transform(pandas.DataFrame.from_dict([current_values]))
    feature_row=feature_row[['age', 'resting_blood_pressure', 'cholesterol',
       'max_heart_rate_achieved', 'st_depression', 'sex_female', 'sex_male',
       'chest_pain_type_non-anginal pain', 'chest_pain_type_asymptomatic',
       'chest_pain_type_atypical angina', 'chest_pain_type_typical angina',
       'fasting_blood_sugar_0', 'fasting_blood_sugar_1', 'rest_ecg_normal',
       'rest_ecg_ST-T wave abnormality',
       'rest_ecg_left ventricular hypertrophy', 'exercise_induced_angina_0',
       'exercise_induced_angina_1', 'st_slope_flat', 'st_slope_upsloping',
       'st_slope_downsloping']].copy()
    
   
    
    shap_values = explainer_random_forest.shap_values(feature_row)
    
   
    
    
    updated_fnames = feature_row.T.reset_index()
    updated_fnames.columns = ['feature', 'value']
    
    
    risk_prefix='<h2> Risk Level :'
    risk_suffix='</h2>'
    risk_probability=feature_model.predict_proba(feature_row)[0][1]
    risk_string=get_risk_string(risk_probability)
    risk_widget.value=risk_prefix+risk_string+risk_suffix
    updated_fnames['shap_original'] = pandas.Series(shap_values[1][0])
    updated_fnames['shap_abs'] = updated_fnames['shap_original'].abs()

    updated_fnames=updated_fnames[updated_fnames['value']!=0]
    
    
    
    plt.rcParams.update({'font.size': 30})
    df1=pandas.DataFrame(updated_fnames["shap_original"])
    df1.index=updated_fnames.feature
    df1.sort_values(by='shap_original',inplace=True,ascending=True)
    df1['positive'] = df1['shap_original'] > 0
    df1.plot(kind='barh',figsize=(15,7,),legend=False)
#HIDDEN
form_item_layout = Layout(
    display='flex',
    flex_flow='row',
    justify_content='space-between'
)

form_items = ui_elements

form = Box(form_items, layout=Layout(
    display='flex',
    flex_flow='column',
   
    align_items='stretch'
   
))


box_layout = Layout(display='flex',
                    flex_flow='column'
                    )

left_box = VBox(ui_elements[0:5])
right_box = VBox(ui_elements[5:])
control_layout=VBox([left_box,right_box],layout=box_layout)

risk_string="<h2>Risk Level</h2>"

risk_widget=widgets.HTML(value=risk_string)
#HIDDEN
display(control_layout)
display(Box(children=[risk_widget]))
risk_plot=interact_calc(get_current_prediction)

risk_plot.widget.children[0].style.button_color = 'lightblue'

Please note: This is running in free servers and you may need to wait for it to load correctly.

Below image shows how the interaction (below) is supposed to render

Local interpretation of heart disease

The things on positive axis contribute positively to the heart risk and things on negative axis contribute towards good heart health.

Below image shows if we reduce the risk

Local interpretation of heart disease

Here we can see if a person is healthy at 57 years, how the lab results and the corresponding risk would look like

An anomaly with cholesterol levels

Local interpretation of heart disease

** Here the model thinks that high cholesterol is good for health. **

This is why it is always important to give interactive widgets to the subject matter experts (here a doctor) to try it out first than giving a set of charts. The next iteration in a model building would be to look at the data and see what pattern emerges which makes this/ train a different model/ tune the model parameter to look for specific patterns.

Disclaimer:

This is a demonstration of how machine learning models can be trained on available patient data and the study of how the model works in new data. Please do consult a doctor for actual interpretations and risk factors.

Acknowledgements

The dataset is taken from three other research datasets used in different research papers. The Nature article listing heart disease database and names of popular datasets used in various heart disease research is shared below. https://www.nature.com/articles/s41597-019-0206-3

The data set is consolidated and made available in kaggle

Thanks to this wonderful post in Kaggle which I have used in data clean up

Leave a comment

Your email address will not be published. Required fields are marked *

Loading...