Additive Feature Attribution Methods

These set of methods are linear approximations ( $g$ ) to the original model ( $f$ ). Mathematically:

$f (x) \approx g (z^{'}) = ϕ_{0} + i = 1 \sum ϕ_{i} z_{i}$

$ϕ_{i}$ s are the effect of each binary feature $z_{i}$ in the output. Clarifications:

$ϕ_{i}$ s do not belong to $f$ , but to the approximation $g$ ,
Two complex models $f_{1}$ , $f_{2}$ trained with same data likely have different $ϕ_{i}$ s,
Methods don't protect from a biased model.

Note: these could be called linear combination of binary features as well.

Best coefficients?

Existing additive feature methods (e.g. SHAP, LIME) calculate $ϕ_{i}$ s differently, in turn yielding different coefficients. But...which one obtains the best coefficients $ϕ_{i}$ ? A definition of best is needed.

The unified approach to interpret model predictions proposes that models should have local accuracy, missingness, consistency. With these requirements, they show that Shapley values are the best coefficients. Other methods violate some of these 3 properties.

The authors argue these properties lead to coefficients more intuitive for humans.

Method: SHAP

SHAP stands for SHapley Additive exPlanations, it is considered a feature attribution method rather than a simplification method. The Principles and practice of explaining ML states:

The objective in this case is to build a linear model around the instance to be explained, and then interpret the coefficients as the feature’s importance. This idea is similar to LIME, in fact LIME and SHAP are closely related, but SHAP comes with a set of nice theoretical properties.

The exact Shapley values $ϕ_{i}$ result from an expensive combinatorial (see sources at the end). Approximations to the exact formula can be made, with extra assumptions, which may not hold!!:

Assumption 1: Feature independence (implies non-multicollinearity).
- Shapley sampling values method,
- Quantitative Input Influence,
- Plus assumption 2, model linearity: Kernel SHAP (LIME + Shapley values)
Assumption 2, model linearity: Shapley regression values.

SHAP provides both global (average across inputs) and local (for a given input).

Method: LIME

Local Interpretable Model Agnostic Explanation (LIME) and Generalised Linear Models (GLMs).¹ Principles and practice of explainable ML describes LIME as:

LIME approximates an opaque model locally, in the surrounding area of the prediction we are interested in explaining, (...) using the resulting model as a surrogate in order to explain the more complex one. Furthermore, this approach requires a transformation of the input data to an "interpretable representation," so the resulting features are understandable to humans, regardless of the actual features used by the model (...)

It is considered a simplification method rather than a feature attribution method.

For LIME, the coefficients $ϕ_{i}$ are found minimising an objective function. The coefficients resulting from the optimisation do not necessarily obey the 3 desired properties listed earlier.

Assuming feature independence and model linearity, the objective function can be modified and the SHAP values obtained through weighted linear regression (no slow combinatorics). This is called Kernel SHAP, and obeys the 3 properties listed earlier.

Fixes

Normalised Moving Rate (NMR): tests the stability of the list against the collinearity. Smaller NMR means more stable ordering.
Modified Index Position, in the paper's words:

[MIP] works similarly to NMR by iteratively removing the top feature and retraining and testing the model. Thereafter, it examines how the features are reordered in the model which implies the effect of collinearity.

These two methods (MIP, NMR) can be useful both in having a reliable sorting of features, and on selecting one —most stable— of several methods.

Definition of a few concepts

Aside: Collinearity and Non-linearity

Multicollinearity: one feature is a linear combination of one or more other features. For example, $x_{3} = β_{2} x_{2} + β_{1} x_{1} + β_{0}$ ; assuming linear independence would be an error. In the paper's words:

Indeed, some features might be assigned a low score despite being significantly associated with the outcome. This is because they do not improve the model performance due to their collinearity with other features whose impact has already been accounted for.

Non-linearity: output changes are not proportional to input changes. For example $y = β x^{N}$ is non-linear, and fitting a line $y^{'} = αx$ to it would be inaccurate. Some SHAP models can model this correctly.

Let's now look at other methods.

Sources

A value for n-person games (1952)
A Unified Approach to Interpreting Model Predictions (2017)
[Principles and practice of explainable machine-learning][principles_and_practices] (2021, 25 pages): overview of many aspects of XAI,
A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME (2025): conceptual aspects (weaknesses, strengths, assumptions) of the popular XAI methods SHAP and LIME.

Local in the name refers to being for a particular input, not Global which would be general. ↩

Keyboard shortcuts