Monday, May 27, 2019

Bank of England: "Opening the machine learning black box"

From Bank Underground:
Machine learning models are at the forefront of current advances in artificial intelligence (AI) and automation. However, they are routinely, and rightly, criticised for being black boxes. In this post, I present a novel approach to evaluate machine learning models similar to a linear regression – one of the most transparent and widely used modelling techniques. The framework rests on an analogy between game theory and statistical models. A machine learning model is rewritten as a regression model using its Shapley values, a payoff concept for cooperative games. The model output can then be conveniently communicated, eg using a standard regression table. This strengthens the case for the use of machine learning to inform decisions where accuracy and transparency are crucial.


Why do we need interpretable models?
Statistical models are often used to inform decisions, eg a bank deciding if it grants a mortgage to a customer. Let Alice be that customer. The bank could check her income and previous loan history to estimate how likely Alice is to repay a mortgage and set the terms for the loan. A standard approach for this type of analysis is to use a logistic regression. This returns a ‘probability of default’, ie the chance that Alice won’t be able to repay the loan, spelling trouble for the bank and herself.

A logistic regression is a very transparent model. It attributes well-defined risk weights to each of its inputs. The transparency comes at a cost though: logistic regressions assume a particular relationship between the explanatory factors. This may not hold, in which case the model’s predictions may be misleading. Machine learning models are more flexible and therefore capable to detect finer nuances provided there is enough data to ‘train’ a model. This is also the reason for their current success and increasing popularity ranging from personalised movie recommendations over credit scoring to medical applications.

However, the cost of this flexibility is opaqueness, which gives rise to the black box critique of machine learning models. It is often not clear which inputs are driving the predictions of the model. Furthermore, a well-grounded statistical analysis of the model is generally not possible. This can lead to ethical and legal challenges if models are used to inform decisions affecting the lives of individuals. This is particularly true for models from deep learning which are driving many AI developments.

There also are concerns regarding the interpretability of machine learning models more specific to policy makers, like eg at the Bank of England’s decision-making committees. First, when using machine learning alongside more traditional approaches, it is important to understand where they differ. Second, a challenge in decision making is trying to understand how relationships between variables might change in the light of policy actions. In both cases, interpretable and transparent models are likely to be very helpful.

Opening the black box
It would therefore be of great use to bring machine learning into the same playing field as currently used models. This would promote transparency and likely speed up model development while helping to avoid harmful bias. Such an approach is laid out here. The idea is to separate statistical inference into two steps. First, the contribution each variable makes to a model is measured. Second, these contributions are taken as the input to a standard regression analysis....MORE