State of AI applied to Quality Engineering 2021-22
Section 1: Get Started

Chapter 3 by Sogeti & Capgemini Engineering

Demystifying Machine Learning for QE

Business ●○○○○
Technical ●●●●○

Listen to the audio version

Download the "Section 1: Get Started" as a PDF

Use the site navigation to visit other sections and download further PDF content

By submitting this form, I understand that my data will be processed by Sogeti as described in the Privacy Policy.*

To stay ahead of the accelerating business race, IT executives must make quick, informed decisions about where and how to implement AI in their Quality Engineering activities.

In this chapter we examine Machine Learning (ML) to gain a high-level understanding of what it is, when to use it, how it works. While we do provide some real-world examples of Quality Engineering, the actual implementation of ML-based tools will be discussed in subsequent sections.

Most used terminology

S1C3 picture 1.png[1]


While we acknowledge that there is no universally accepted or legally defined term for artificial intelligence[2], allow us to make an attempt. AI refers to a system's ability to perform cognitive functions normally associated with human minds, such as perceiving, reasoning, learning, interacting with the environment, problem solving, and even exercising creativity. The incorporation of AI into quality engineering enables automation processes to handle the bulk of test management, freeing up professionals to explore novel methods for improving end quality. AI and machine learning-driven quality engineering can result in the optimization and acceleration of application quality and delivery speed, while also keeping track of the key performance indicators and metrics that must be measured.


[1] Deep learning is covered in the next chapter.
[2] Section 1, chapter 2

Machine learning

Machine Learning (ML) has been around since the 1980’s, but only in the 2010 decade has it made headway into mainstream software development. Today, ML powers everything from autonomous cars, to search, to stock trading. Machine learning (ML) is the study of computer algorithms that improve themselves automatically as a result of experience and data use. Machine Learning is the subset of AI that "learns" how to perform a specified task more efficiently and effectively over time. These programs analyze historical performance data in order to predict and improve future performance. By detecting patterns in sample data, referred to as "training data," machine learning algorithms build a model and learn to make predictions and recommendations without being explicitly programmed to do so. Additionally, the algorithms adapt in response to new data and experiences to continuously improve their efficacy. Machine learning algorithms are used in a wide variety of applications where developing conventional algorithms to perform the required tasks is difficult or impossible.

S1C3 picture 2.png

Figure: various sub-types of Machine Learning

Why Machine Learning for Quality Engineering now?

Since 2010, significant advances in computing and data availability have resulted in even more breakthroughs in machine learning algorithms and infrastructure. In other words, we now have the capability, information, capability, and know-how to implement machine learning in many more contexts. Today, we rarely need to start from scratch when it comes to machine learning, as existing libraries and tools greatly simplify the process. All we need to know is how to deploy a machine learning library such as TensorFlow.

Quality Engineering possesses all of the necessary characteristics for machine learning application. It has the potential for big data, analytical questions, and/or gamification. When combined with organizations' growing need to analyze and make decisions more quickly in order to release more quickly, the environment is ripe for disruption and innovation. It is not a matter of whether machine learning will impact how we develop, test, and release software; rather, it is a matter of when and how. This can only be achieved through the adoption of a learning culture.

The vast majority of discussion so far has centered on how machine learning will eliminate tester jobs or why machine learning cannot eliminate any tester job. The reality lies somewhere in between. As with automation, machine learning enables machines to perform routine, low-level tasks. Organizations must transition away from best effort quality assurance and toward continuous quality improvement. Without machines, we will be unable to compete, and it is up to executives and managers to educate their employees about how machine learning can help them become not only better testers, but ultimately quality owners.

What to consider, where to start?

The following parts of this chapter will assist you in identifying the appropriate Machine Learning opportunities by providing multiple use cases. We recommend that, as you read on, you keep the following questions in mind:

  • Step 1: Ask yourself what challenge is worth addressing?
  • Step 2: Is the relevant data available?
  • Step 3: How could we measure our success?
  • Step 4: Build or Buy?

After identifying Machine Learning opportunities for your business and ensuring data availability, you will need a system to track and manage efficiency and effectiveness. QE managers must make decisions based on the analytical data exposed by machine learning, which requires a high-quality analytics dashboard or dashboards that aggregate data from all environments, applications, tools, and languages and convert it to actionable insights. It will help you decide which parts of your code and tests to focus on and prioritize, as well as which development and testing techniques to use.

Supervised learning

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output. It involves using a model to learn a mapping between input features and the target variable. Supervised learning is used in predictive models (regression, association, and classification) in which the relationships between the inputs and outputs are well understood. It is applicable to a wide variety of analytic and predictive scenarios encountered during software testing.

S1C3 picture 3.png

Figure: Supervised Learning

The fundamentals

What it is

There are two primary types of problems associated with supervised learning:

  1. Classification: the purpose is to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories. Example: Given a patient with Covid19, we have to predict the patient's disease course in terms of clinical states: moderate, severe or critical.
  2. Regression: we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. Example: Given the experience of a person, we have to predict their salary based on their years and type of experience.

Both classification and regression problems can have one or more input variables, which can be of any data type, including numerical or categorical data.

When to use it

When you need to create a straightforward predictive model using a well-structured dataset, and when the classification of the input data and the target values are known.

How it works

  1. It infers a function from labeled training data, which consists of a collection of training examples. Each example in supervised learning consists of a pair of an input object (typically a vector) and a desired output value (also called the supervisory signal).
  2. A supervised learning algorithm examines training data and generates an inferred function that can be used to map new examples.
  3. In an ideal scenario, the algorithm will be able to correctly determine the class labels for unseen instances.
  4. This requires the learning algorithm to make "reasonable" generalizations from the training data to previously unseen situations.

The most widely used supervised machine learning algorithms are:

Example 1: Risk prediction in QE

R&D managers must be aware of the risks associated with each release. They will be able to determine whether or not to proceed with production and will be able to estimate mitigation resource allocation.

  • Predictable outcome: risk score per release
  • Data: The number of commits, the time of the commit, the number of tests and their results, the percentage of code covered by tests (unit and system), the number of builds, the number of releases, the day of the week of the release, the number of user stories and features covered in a release, the number of solved bugs in a release, and so on.
  • Algorithm: Linear Regression.
    It attempts to fit the dataset to a straight hyperplane. It works well when the variables in your dataset have linear relationships. In this case, machine learning-based linear regression can accurately predict the release risk score for each and every release, regardless of the amount of code, tests, user stories, or features contained within.

Example 2: Intelligent quality gates

R&D managers must automate the failure of high-risk builds in a CI/CD pipeline. However, quality gates and thresholds are deployed and configured manually, and their historical behavior and stability can vary significantly. Static thresholds will ensure that you only accept the lowest level possible. Therefore, how do you ensure that the quality status quo is maintained and that the overall quality trend is positive?

  • Predictable outcome: Is the build acceptable or not?
  • Data: Aggregate code coverage trends, code coverage by test (unit, API, regression), untested code additions and changes, code mutations tested, and so on
  • Algorithm: Random Forest algorithms combine predictions from numerous individual decision trees that split the dataset repeatedly into distinct branches with the highest information gain from each split. This branching structure enables them to learn non-linear relationships on their own. The best way to think about Random Forest is in terms of crowd wisdom; random forests will frequently make more accurate predictions than any single decision tree. Random forests can be used to create intelligent quality gates that continuously learn how (and at what thresholds), when, and where to deploy. In comparison to static threshold gates, intelligent quality gates are dynamic and will enable the detection of hidden quality and risk trends much more quickly.



Additional use cases

  1. Riskier code components prediction:
    Identify riskier code components of a repository based on extracted change patterns from historical data from repository metadata.
    Useful algorithms for this:
    • Support-vector machines
    • Random Forest
    • Logistic regression
    • Decision trees

  2. Change Based Testing:
    Select the ‘specific’ tests to execute after every ‘commit’ or associated ‘requirement change’ for incoming builds.
    Algorithms useful for this:
    • Support-vector machines
    • Decision trees
    • Random Forest

  3. Defect Count Prediction:
    Time series prediction for number of incoming defects based on historical count of daily incoming defects.
    Algorithms useful for this:
    • Logistic regression

  4. Release Readiness:
    Predict readiness of a product to be released, based on data from various sources in production environment.
    Algorithms useful for this:
    • Support-vector machines
    • Decision trees
    • Random Forest
    • Logistic regression

  5. Automated Defect Triage:
    Automatically triage defects based on log analytics done over test execution records and defect repository. The Triage model investigates the nature of the failure, the failures are further classified into Failure due to Configuration or environment issues, product issues, Test program issues, test framework issues.
    Algorithms useful for this:
    • Support-vector machines
    • Logistic regression
    • Naive Bayes
    • Decision trees
    • Random Forest

  6. Usage Pattern Analysis:
    Log Analytics on top of application logs and test execution workflows for finding frequent workflows, usage and sequence of most common actions performed.
    Algorithms useful for this:
    • Association Rule Learning
    • Apriori Algorithm

  7. Commit Classification:
    Identify Bug Fix commits by analyzing commit comments using Natural Language Processing. Classify Commits into various categories like Feature Enhancement, Bug Fix, Functional Change, Refactoring Task etc.
    Algorithms useful for this:
    • Random Forest
    • Support-vector machines
    • Logistic regression
    • Naive Bayes
    • Decision trees

Unsupervised learning

Unsupervised learning enables us to approach a class of problems with little or no prior knowledge of how the outcomes should look. We can derive structure from data even when the effect of the variables is unknown. This structure can be derived by clustering the data according to the relationships between the variables in the data. Unlike supervised learning, unsupervised learning operates solely on untagged input data and does not require a teacher to correct the model. There is no feedback associated with unsupervised learning.

S1C3 picture 4.png

The fundamentals

What it is

The hope is that by mimicry, the machine will be compelled to construct a compact internal representation of its environment.

Although there are numerous types of unsupervised learning, there are two primary issues that practitioners frequently face:

  • Clustering: An unsupervised learning problem that entails classifying data.
  • Density estimation: An unsupervised learning problem that involved summarizing the distribution of data.

When to use it

When we are unsure how to classify the data and wish for the algorithm to discover patterns and classify it for us.

How it works

  1. The algorithm receives unlabeled data
  2. It infers a structure from the data
  3. The algorithm identifies groups of data that exhibit similar behavior

Some of the most common algorithms used in unsupervised learning include:

QE use cases

  1. Associated test failures:
    Cluster test cases based on test execution records in order to identify test cases that fail together. K-Means Clustering is a useful algorithm for this.

  2. Detect Anomalies in a system under test:
    Identify anomalies by extracting features from system logs. Useful algorithms for this are:
    • Principal component analysis
    • Local outlier factor
    • Isolation forest

  3. Test Coverage Analysis for Product Features:
    Perform test coverage analysis & traceability for desired product features using Natural Language Processing (NLP) and summarization.
    • Algorithms Useful for this are
    • K-Means Clustering
    • Mixture Models

  4. Rectify noise in images, e.g. anatomical images produced by MRI.
    Isolation Forest is a useful algorithm for this.

Reinforcement Learning

The term "reinforcement learning" refers to the process of teaching machine learning models to make a series of decisions. The agent develops the ability to accomplish a task in an uncertain and potentially complex environment. In reinforcement learning, an artificial intelligence is presented with a situation resembling a game. The computer solves the problem through trial and error, with the associated rewards and penalties.

S1C3 picture 5.png

The fundamentals

What it is

Reinforcement learning is a class of problems in which an agent must learn to operate in a given environment through feedback.

Reinforcement learning is the process of determining what to do — how to map situations to actions — in order to maximize the magnitude of a numerical reward signal. The learner is not told which actions to take but must determine which actions yield the greatest reward through trial and error.

Due to the interaction of reinforcement learning algorithms with their environment, a feedback loop exists between the learning system and its experiences.

When to use it

  • when we lack sufficient training data,
  • when we are unable to clearly define the desired end state,
  • or when the only way to learn about the environment is through interaction.

How it works

  1. The algorithm takes an action on the environment.
  2. It is compensated if the action brings the machine closer to maximizing total rewards.
  3. The algorithm optimizes for the ideal sequence of actions over time by self-correcting.

Examples of algorithms: 

Example 1: Intelligent quality gates

Bug hunting can be gamified by applying a point system to bugs. It is simply a matter of defining which forms of exploitation will be rewarded and/or penalized. With the ultimate reward being the app's crash, and smaller error rewards scattered along the way.

  • Predictable outcome: faults, application crashes and design flaws
  • Data: Not available and irrelevant since this is reinforcement learning. Without pre-existing training data, the agent will learn through trial and error. All that is required is the ability to assign an end goal and points (both positive and negative) along the path.
  • Algorithm: Q-learning
    It determines which action to take by weighing the value of being in a particular state versus the value of performing a particular action in that state. Q-learning is about balancing exploration and exploitation. It is entirely up to you whether to instruct your algorithm to exploit along your desired path or to conduct exploratory testing.

Example 2: Intelligent test execution

Today, the most effective test execution technique is Test Impact Analysis (TIA), which identifies which tests should be run in response to code changes. You can use policy learning to determine which tests to run based on risk and resource constraints. Is there a history of regression bugs in the microservice? Is this route frequently used by your users? Should these tests be run concurrently?

  • Predictable outcome: How to save resources and accelerate the release without jeopardizing quality?
  • Data: Number tests and their result, code coverage by test (unit and system), number of bugs in a release, test runtimes, user behavior paths, tests marked ship etc.
  • Algorithm: Direct Policy Search
    Policy learning establishes a direct link between each state and the most appropriate action in that state. Policy learning is the process by which your algorithm learns how to conduct tests on its own.

Additional QE use cases

  • Automate gesture: a robot learns functional workflows and performs automated manual testing by performing actions such as touch, scroll, spin, and double tap on the System Under Test.

  • Testing games: reinforcement learners can be trained to perform actions in gaming environments through brute force techniques and test various features of a game.

  • Sorting large data sets: Robots can learn object by reinforcement learning. This type of robots is used by e-commerce sites and other supermarkets use these intelligent robots for sorting their millions of products every day. 



Explainable AI

In simple terms, Explainable AI is the ability to explain or present as to why a certain decision or outcome was made in a format which is understandable by humans. The aim of explainability is the same across all types of ML models but the way the models explain themselves can be different.

Explainable AI (XAI), Interpretable AI, and Transparent AI all refer to artificial intelligence (AI) techniques that are trustworthy and easily understood by humans. This is in contrast to the concept of the 'black box' in machine learning, where even the designers of the AI are unable to explain why the AI made a particular decision.

S1C3 picture 6.png

The fundamentals

What it is

Explainable (XAI) helps in explaining and comprehending the useful parts of an algorithm’s working, that improve human trust. This includes the following:

  • Auditing the data used to train machine learning (ML) models – to ensure that the 'bias' is understood.
  • Understanding the decision paths for the edge cases (like false-positives and false-negatives)
  • Understanding the robustness of models; specifically, in relation to adversarial examples.

When to use it

  • When we need to understand, explain and justify the outcome of an algorithm,
  • When obtaining the reasoning and justification for a particular decision is critical.
    Example: Customers who have been denied a bank loan inquire about the reason for the rejection.

How it works

S1C3 picture 7.png

After the model is built, the techniques examine all the data features to ascertain their relative importance in determining the model's output. XAI methods employ a unique approach to examining the model and deriving its underlying methodology. Based on these factors, outcomes which can be trusted and easily understood by humans are derived.

QE use cases

Explainable AI for riskier code identification use case: Explainable AI identifies the most influential parameters, in a human-readable format, that contributed to a Code Component being classified as risky/defective. This justifies the model function and establishes trust in it.

S1C3 picture 8.png

About the authors

Vivek Jaykrishnan

Vivek Jaykrishnan

Vivek Jaykrishnan is an enterprise test consultant and architect with extensive experience. He has over 22 years of experience leading Verification and Validation functions, including functional, system, integration, and performance testing, in leadership positions with reputable organizations. Vivek has demonstrated success working across a variety of engagement models, including outsourced product verification and validation in service organizations, independent product verification and validation in captive units, and globally distributed development in a product company. Additionally, Vivek has extensive experience developing and implementing test strategies and driving testing in accordance with a variety of development methodologies, including continuous delivery, agile, iterative development, and the waterfall model. Vivek is passionate about incorporating cognitive intelligence into testing and is also interested in exploring the frontiers of IoT testing.

Antoine Aymer

Antoine Aymer (chief editor)

Antoine Aymer is a passionate technologist with a structured passion for innovation. He is currently the Chief Technology Officer for Sogeti's quality engineering business. Antoine is accountable for bringing solutions and services to the global market, which includes analyzing market trends, evaluating innovation, defining the scope of services and tools, and advising customers and delivery teams. Apart from numerous industry reports, such as the Continuous Testing Reports, the 2020 state of Performance Engineering, Antoine co-authored the "Mobile Analytics Playbook," which aims to assist practitioners in improving the quality, velocity, and efficiency of their mobile applications through the integration of analytics and testing.