State of AI applied to Quality Engineering 2021-22
Section 9: Trust AI

Chapter 2 by Sogeti

Testing the ethics of AI

Business ●●●●○
Technical ●○○○○

Listen to the audio version

Download the "Section 9: Trust AI" as a PDF

Use the site navigation to visit other sections and download further PDF content

By submitting this form, I understand that my data will be processed by Sogeti as described in the Privacy Policy.*

In this chapter, we give guidelines to assist quality engineers in testing artificial intelligences (and related systems) fairly, ensuring that they make reasonable, explainable, and intelligible decisions and are not biased in their design, training, or operation. In other words, ethical testing should be used in conjunction with traditional testing. Utilize the sections that make sense for you and the work at hand. Inquire and do not assume.

If the previous chapter offered a framework for testing AI, we now examine the sensitive topic of ethics of AI systems. Ethics is a vast and complex field of study that has absorbed many human minds for a very long time and much exceeds the scope of this chapter. The Toronto Declaration, which was released in May 2018, is the first universally adopted reference statement that provides guidelines to preserve human rights in the age of artificial intelligence. AI should not create new barriers to equality, representation, or diversity. When discrimination happens and prevention is not sufficient, a system should be questioned and harmed promptly.

It specifically points out the crucial responsibility of all Governments and private sector players in preventing and mitigating discriminatory risks in the design, development, and application of machine learning. They must also ensure that proper remedies are available before, during, and after system deployment.

  • Can the AI system under test demonstrate that it is equitable and free of discrimination?
  • What can we do to mitigate risk and improve the fairness of artificial intelligence systems?

In this chapter, we give guidelines to assist quality engineers in testing artificial intelligences (and related systems) fairly, ensuring that they make reasonable, explainable, and intelligible decisions and are not biased in their design, training, or operation. In other words, ethical testing should be used in conjunction with traditional testing. Utilize the sections that make sense for you and the work at hand. Inquire and do not assume.

A quick word about Ethics

What we want is a framework that enables people and machines to make decisions that are not biased in favor of one set of people over another. If errors occur, we must be able to recognize and remedy them, allowing for human oversight of the process.

As Google used to say “Don’t be evil”.

Ethical testing can entail more than ensuring that the system's assumptions and usage are free of prejudice. Are we worried about an ethical AI being employed in an unethical context? Consider a payday loan risk engine that has been shown to be non-discriminatory but is part of a larger system that intentionally targets people who are unable to obtain credit at standard rates.

How about the environmental impact of the system under test? AI systems can consume a significant amount of energy, adding to greenhouse gas emissions and leaving a significant carbon footprint. We need to be able to fix the sustainability issue during the system's testing.


It may appear like testing an AI is unlike anything you've done previously. There are many new terminology, terms, and ways of working. Depending on the system and the task to be evaluated, testing an AI may be similar to testing other systems. We know what the system should do, and we must ensure it does it fairly and ethically.

That doesn't mean there won't be something new or different from previous testing. Until you get into the nitty-gritty, you can assume that testing an AI ethically will be a black box, especially as pre-trained or vendor-supplied AI systems become more common. After all, system users rarely have access to the source code or the network diagrams that drive it. So we begin by evaluating the users of the system- and there may be many.

AI systems can be extremely powerful for those who rely on them for decision-making, both as businesses and as end-users. Fairness must be shown. Affirmative evidence of fairness will be acquired as part of our testing efforts and those of data scientists, product owners and others who contribute to the creation, testing, deployment, and support of AI.

Testing is crucial to proving that a system's decisions are fair and free of bias. Because AI is learning in production, the principles of justice and decision-making need to be clear and reportable.
Unexplainable or biased systems should fail testing. These new frameworks work with existing and emerging legislation like Equalities Acts and GDPR/UK GDPR.

Suggested assessment framework for ethical testing

Is the system ethical, devoid of bias or inequity? Is it fair?

The objective of the assessment is to gather information from relevant stakeholders about their level of knowledge and awareness regarding the ethical implications of artificial intelligence. Conducting this assessment early in the creation of an AI system, or near the start, allows a clear understanding of the remaining issues. It is not a means of determining an absolute ethical score; rather, it is a means of ensuring that the conversation occurred, of comprehending some of the system's ethical implications and educating people about them; and, if there are items that need to be improved, of documenting them, presenting them back (socializing them), and agreeing on a course of action to address them within a "reasonable" time frame.

Figure: Suggested framework

Figure: Suggested framework


  • Each interviewee will be asked ten questions.
  • Each interviewee will have one interviewer and one note taker. At the conclusion, they will agree on the score for each question.
  • While a sample size of five or fewer is not ideal, the evaluation must also take into account the fact that time and the number of accessible individuals may be limited.
  • Interviews should last no more than 30 minutes and should be conducted in a casual setting, preferably away from other people.
  • While we record that we interviewed someone, we do not share their responses with anyone outside of those conducting the assessment. • Each question is assigned an ideal score based on industry best practices or the organization's ideal.
  • Responses are scored on a scale of 1 to 5 (complete compliance and best practice) in increments of 0.5.
    • 1- not known, not considered, not compliant, gap
    • 2- scheduled to be worked on but work is not started or at an early stage
    • 3- in progress but work still needs to be carried out
    • 4- mostly compliant
    • 5- fully compliant with industry best practice
  • After each set of sessions, the average response to each question will be used to calculate the score for that section.
  • Each question includes explanatory notes to assist in eliciting information from interviewees.
  • A spider diagram will be created to illustrate the difference between the ideal and actual score for each question.
  • If the evaluation is conducted early in the lifetime, it may be desirable to re-run it – maybe with a subset of the initial interviewees – to determine how successfully the issues have been handled and if any gaps remain before deploying the AI operationally.
  • While the names of interviewers will be made public, the identity of those who really said what should not be disclosed. Only the average scores for each question should be included in the outputs published. This is to encourage individuals to speak freely and candidly with interviewees.
  • Recommendations will be made highlighting both important successes and areas for improvement that must be addressed. There is no pass or fail mark; rather, it is a technique of determining where things stand at the moment and what needs to be done to reach a "later" stage.

Possible interviewees include:

  • Data protection, GDPR, or Information protection authority
  • Project sponsor or owner
  • CIO / CTO (this is a high profile application)
  • Vendor lead
  • Project manager
  • Delivery lead
  • Data scientist
  • User SME
  • Test architect or solution architect

Examples of questions

  1. Is the algorithm fair? Can they provide proof that it is fair?
  2. Is it possible to convey the decisions in straightforward, easy-to-understand language?

Example of Results

Example assessment sheet
Category Target Avg. P1 P2 P3 P4 P5
1. Fairness 3 1.8 2 4 1 1 1
2. Explainable 3 1.6 1 1 4 1 1
3. Data Bias 3 2.2 2 4 1 3 1
4. Bias Review 3 1 1 1 1 1 1


Figure: Radar diagram

Figure: Example radar diagram


Several recommendations will be made based on the scores.

To ensure fairness and traceability, all materials used in this evaluation should be archived - apart from the interviews, which should not identify individuals so that they can speak candidly to the interviewers without fear of repercussions.

Bear in mind that this is an assessment, and if some sections do not perform as well as anticipated / expected, your duty is to assist in expediting the completion of critical changes. There are complexities to everything, and it is the assessment's responsibility to comprehend the customer's demand, the reality of where they are, and their desired destination. The advice we generate will be critical in assisting them in achieving their optimal state.

Optional assessment #1

This optional supplementary assessment is designed to confirm that the testing was conducted ethically and was complete and suitable.

  • Is the system currently being tested?
  • And has it been ethically tested?
  • And does it behave ethically?

This checklist can be used to examine the ethical aspects of testing prior to concluding testing.

If you've ever conducted a gating review, phase review, or readiness review, you'll recognize the process.

If the testing recommendations outlined before were followed, this should be a review of what has been accomplished, the status (success, fail, etc.) captured, and then included in the test exit report.

Test checklist
Examples of Activity Status (Pass, Conditional Pass, Fail, N/A)
1. Has the system been tested?  
2. Does the system meet the test exit criteria?  
3. Future actions, risks, issues documented?  

Optional assessment #2

This examination will enable for the resolution of a few critical questions.

  • Is the system ready for production, which includes how thoroughly it has been tested?
  • Are the procedures and methods of operation required for a successful live operation in place?

Examples of Questions

  1. Can we justify the decisions?
  2. Can erroneous decisions be overridden?
  3. Is the deployment strategy complete?

Further reading

Here are a few readings for those interested in learning more about some of the themes discussed.

About the author

Andrew Fullen

Andrew Fullen

Andrew Fullen is Sogeti UK’s Head of Technology and Innovation with more than 20 years experience in testing across healthcare, government, engineering and telecoms sectors. He leads the UK architects and is one of the authors of the World Quality Report. He is recognized as one of the key thought leaders and problem solvers. He is the UK head of SogetiLabs and works globally to improve testing, engineering practices, and more effective use of technology, calling on his wider technical experience in testing and from his earlier experiences as a developer, DBA and build manager. He is also one of the global thought leaders for testing, AI/Machine Learning and tooling, having helped design one of our internal AI platforms. He has extensive contacts with many tool providers and other thought leaders."

About Sogeti

Part of the Capgemini Group, Sogeti operates in more than 100 locations globally. Working closely with clients and partners to take full advantage of the opportunities of technology, Sogeti combines agility and speed of implementation to tailor innovative future-focused solutions in Digital Assurance and Testing, Cloud and Cybersecurity, all fueled by AI and automation. With its hands-on ‘value in the making’ approach and passion for technology, Sogeti helps organizations implement their digital journeys at speed.

Visit us at

Capgemini is a global leader in partnering with companies to transform and manage their business by harnessing the power of technology. The Group is guided everyday by its purpose of unleashing human energy through technology for an inclusive and sustainable future. It is a responsible and diverse organization of 325,000 team members in nearly 50 countries. With its strong 55 year heritage and deep industry expertise, Capgemini is trusted by its clients to address the entire breadth of their business needs, from strategy and design to operations, fueled by the fast evolving and innovative world of cloud, data, AI, connectivity, software, digital engineering and platforms. The Group reported in 2021 global revenues of €18 billion.
Get the Future You Want!