Continuing our previous example, we will use the dashboard to build a narrative around explaining a model.
1. Global Level Explanation¶
1.1. Global Feature Importance¶
The global feature importance identifies top variables with high impact values, according to the model, on the overall prediction.
1.2 Global Feature Impact¶
The global feature impact identifies whether the variable had an overall positive impact or an overall negative impact, according to the model, on the overall prediction.
2. Local Level Explanation for a single prediction point¶
You can zoom in on a single instance and examine what the model predicts for this input, and explain why. If you look at an individual prediction, the behaviour of the otherwise complex model might behave more pleasantly.
For local explanation, we need to pass in the row number we want to explain. By default, the row number is set to 1 but you can change in the input box below.
So in the next graphs, we are identifying feature importance, impact and prototypes for row number 1.
2.1 Feature Importance for a single prediction¶
The local feature importance identifies how variables impacted a single prediction. We take absolute of the impact values to give holistic importance of each variable.
2.2 Feature Impact for a single prediction¶
The local feature impact identifies which variable had an overall positive impact or an overall negative impact, according to the model, on a single prediction. This helps you gauge the logic of the model and weight the model assigned to each variable to reach a prediction.
2.3 Prototypical Analysis¶
With our prototypical analysis, you can identify prototypical or similar profiles to your single prediction. With similar profiles, you can understand that based on the top five similar profiles with these specific similar variables, the model came to a specific decision. The prototypical analysis makes it easier for business users to relate to the analysis. Additionally, you can also highlight important features for different prototypes that made them similar to the user/prediction in question.
You can see the how similar the prototypes are to the row we are explaining by looking at the Weight column below which gives similarity score in percentage.
3. Scenario Analysis¶
We have added a filtering function within our data table that you can use to slice your data and apply filters to extract a subset of the entire data. Then by using that subset of data, you can explain multiple instances and scenarios to, identify patterns within that specific sub-group, find similar profiles from within that subset. This is extremely useful when you are trying to understand behaviour on a specific cluster or group of data
We have added that in a very unique fashion - now instead of writing SQL queries, you can directly interact with the data table and add filters. In this example, I wanted to slice the data where
AGE > 40. And automatically, I can interact with the tabs below to explore how predictions vary.
4. Feature Interactions¶
If a machine learning model makes a prediction based on two features, we can decompose the prediction into four terms: a constant term, a term for the first feature, a term for the second feature and a term for the interaction between the two features.
The interaction between two features is the change in the prediction that occurs by varying the features after considering the individual feature effects.
4.1 Partial Dependence Plot (PDP)¶
The partial dependence plot (short PDP or PD plot) shows the marginal effect one or two features have on the predicted outcome of a machine learning model. In this example, we can see that as
AGE increase, its impact on the predicting outcome also increase which results in a higher prediction (mean value of the house) represented by the increase color contrast.
We have added input bars where you can further explore how features are interacting with each other and affecting the prediction value.
4.2 Summary plot¶
The summary plot combines feature importance with feature effects. Each point on the summary plot is an impact value for a feature and an instance. The position on the y-axis is determined by the feature and on the x-axis by the impact value. The colour represents the value of the feature from low to high.
5. Feature Distributions¶
5.1 Histograms & Violet Plots¶
Use the histogram to get a distribution of your numerical or categorical variables. Use joint violet plots to get basic statistics summary like mean, median, model, Q1, Q3, Q4 for your variables. Now you can identify the distribution of the predicting variable on top of your other variables to find a join distribution.