3.4 Model in production
Appropriate monitoring of the model’s performance over time depends on the implementation of the model: automatic real-time scoring requires continuous monitoring with automated tests to ensure stable model performance, while a manual setup with cyclical retraining or re-optimisation naturally includes performance checks in each cycle. In any case, a mechanism to flag possible changes in the performance over time should be in place.
If the model is retrained or redeveloped based on the outcome of previous predictions, this feedback loop needs to be designed such that no additional bias is introduced.
The performance metric used when optimising the model reflects policy decisions, (for example, in prioritising sensitivity over precision). As such policies can change, the performance metric must be adjusted when updating the model.
- Performance degrading over time (for example, due to change in demographics)
- Increased model bias
- Obsolete choices embedded in the model
- Model is ‘repurposed’ over time, and predictions begin to be used out of context.
3.4.1 Risk assessment: Model In production
3.4.2 Possible audit tests: Model in production
- Verify that the population for which the model is used in production is (still) sufficiently represented in the training data.
- Obtain the code of the production version of the model.
- Compare performance in production to expectation from development.
- Review monitoring of development of performance and input data distributions.