3.1 Project management & governance
We have already discussed the notion that in undertaking an audit of an ML project auditors would start with a baseline review of project documentation. Regardless of the depth of audit being undertaken, a review of documents and understanding the context surrounding the project is at least as important as undertaking more detailed investigative work into the model itself.
There are issues that are common to the review of other (software development) projects and with any projects where a statistical model is used to support decision making more broadly. The following considerations are loosely based on the UK National Audit Office’s framework for reviewing models , extended to meet the challenges of AI applications.
3.1.1 Misalignment/ diversion from project objectives
Low technical understanding within management and limited expertise with practical issues in the technical staff can lead to miscommunication and wrong expectations. Auditors should look for indications that the model is tailored to the project’s objective(s):
- Has the model been developed in collaboration with the project owner? For example:
- Are requirements captured and documented into a specification?
- Is consideration made for the relative importance of different types of error (false positives/ false negatives)?
- Are assumptions listed and agreed?
- Is there an agreed quality assurance plan throughout the model development process?
- Is there evidence that the model’s project owner has influenced the development of the model to meet expectations?
- Is the level of transparency required well defined in planning? There is active research in explainable ML, so while we may find new techniques to ‘explain’ models previously deemed as black boxes, the issue we should identify is to what extent the model that was delivered meets the requirements of the project.
- Is there a forum within the auditee organisation for people with relevant technical expertise, outside of the development process, to challenge the development and use of model outputs? This means both an independent internal control unit, and a forum with clear accountability for complaints from external users or data subjects.
- Project fails to deliver on stated objectives.
- Misunderstanding between project owners and developers resulting in wasted or poorly focused effort.
- Project meets documented requirements, but stakeholders are dissatisfied and, in practice, their desired outcomes are not realised.
3.1.2 Lack of business readiness/ inability to support the project
Knowledge about a particular model is often concentrated in few staff members with high ML competence. Miscommunication between these ML developers and either the users of the model (such as case workers) or the IT staff responsible for maintenance in production, can lead to inefficient implementation of the ML algorithm and failure of the project. To mitigate against these risks, auditors should consider:
- Are roles and responsibilities documented?
- Training of end-users: Has the potentially probabilistic nature of model predictions been properly explained to end users (such as case workers)? Has a policy or guidance for the interaction between AI system and human workers communicated, such as the authority and accountability regime to arbitrate in case of disagreement between human end-users and decisions or recommendations of the AI system?
- What processes are in place for succession planning/handover when a key person leaves the project? Similarly, what processes are in place for the handover from the development project to operation and maintenance in production?
- Transition from development project into the business as usual process is dysfunctional.
- Inability to support project on an ongoing basis.
- End users are unable to understand/challenge model outcomes leading to non-transparent or unfair decisions.
3.1.3 Legal and ethical issues
There are additional laws and regulations applicable to ML algorithms when considered alongside standard IT systems. Possible issues strongly depend on the type of model and application context.
- What laws and regulations have to be considered? This includes
- Normal operation. For example what type and level of transparency/explainability is needed: is a global understanding of tendencies enough (global explainability), or do single decisions need to be justifiable to the extent that advice can be given about how citizens can change the outcome (such as to get approval for support)? Are data subjects informed about the processing of their data by an AI system, and/or an automated process (if that is the case)?
- Possible side effects of a perfectly well-operating system: For example reinforcement of existing structures, under- or overexposure. Details depend on the type of AI application. for example a recommender system used to suggest relevant job advertisements to unemployed citizens might concentrate on certain career paths, missing out on the potential for non-standard retraining.
- Possible side effects due to model imperfections: For example, a biased model that discriminates on protected characteristics.
3.1.4 Inappropriate use of ML
Another risk not unique to ML projects but notable due to the current ‘hype’ around such technology is the risk that some auditee organisations might apply ML techniques not because they add value but instead due to a desire to be seen to be using cutting edge technology. While we are positive about the potential of ML to add value in a broad spectrum of applications, we must also be clear in our assessment of these applications when there is a risk of negative outcomes to the general public. Instances of this risk should be identifiable from an understanding of the project objectives.
In evaluating this, auditors should ensure:
- the component of the solution that ML is applied to is clearly identifiable, justifiable and separable from the other surrounding business logic. This avoids the tendency for optimistic project planning to treat ML as a ‘black-box’ that can solve any and all business problems;
- in planning the project, the problem statement is well defined and gives experts scope to experiment. More specifically avoiding statements like ‘We will use deep learning to do X’ instead, focus on the outcome of the project, and how its success will be measured.
- there is clear evidence that management has identified that their chosen ML model is a necessary improvement over current methodologies.
- Project objectives are not realised.
- Overly expensive or complicated solutions to otherwise simple problems.
3.1.5 Transparency, explainability and fairness
Public administration has to be transparent in the sense that the decision making process should be justifiable and to some extent understandable by the general public. Further, citizens usually have the right to explanations of decisions that impact them. The concepts of ‘transparency by design’ and ‘fairness by design’ incorporate considerations along these lines in every step of the development of the AI system (and rightfully so); the audit of these aspects is explored in Section 3.5 Evaluation. Section 3.5 Evaluation
If personal data or proxy variables for personal attributes are used, the EU’s General Data Protection Regulation (GDPR) and/or national privacy laws apply and auditors should consider if the ML application is the least intrusive option to satisfy the objective10 and whether all features related to personal data contribute enough to performance to justify their use.
Additionally, it might be necessary, depending on the ML application, to consider the disclosure risk - this can occur when personal data has been used to train the model, this personal information is encoded in the model and it can be possible to reconstruct parts of the dataset.11
- Violation of data protection regulations
3.1.7 Autonomy and accountability
Decisions with legal or similarly significant effects made by ML using personal data are not allowed to be fully automated - citizens have the right to human involvement (with some exceptions: see article 22 of GDPR). Hence the auditor must evaluate the method of human involvement and ensure the system includes the ability to execute this right, including sufficient information being communicated to the affected person.
In ML-assisted decisions, where a human is responsible for the decision but uses ML as one (possibly the main) source of information, the discretion of that person should be evaluated, examining their ability to decide against the algorithm’s advice. Additionally, auditors should examine the possible consequences if that decision turns out to be wrong.
In particular, it must be clarified which real person or legal entity bears responsibility for AI-autonomous or AI-assisted decisions. Two separate questions need to be answered in this context : (1) Who is responsible for harm caused by the ML algorithm performing as expected? (2) Who is responsible in the case of failure?
This can become particularly challenging if a third party has developed the ML system.
- Automated processing of personal data without the knowledge or consent of the affected persons
- The condition of a ‘human in the loop’ is not realised, or only formally
- Unclear roles and responsibilities
3.1.8 Risk assessment: Project management and governance
 UK National Audit Office (2016): Framework to review models, https://www.nao.org.uk/report/framework-to-review-models/#:~:text=National%20Audit%20Office%20report%3A%20Framework%20to%20review%20models&text=This%20framework%20provides%20a%20structured%2C%20flexible%20approach%20to%20reviewing%20models.&text=The%20framework%20is%20based%20on,HM%20Treasury%20and%20international%20standards.
 M. Wieringa (2020): What to account for when accounting for algorithms, doi: https://doi.org/10.1145/3351095.3372833.
See Appendix One Personal data and GDPR in the context of ML and AI for a summary of relevant GDPR rules.↩︎
For example in the context of diagnosis codes or crime convictions, reconstructing which person was part of the training dataset can already be revealing personal information.↩︎