State of AI applied to Quality Engineering 2021-22
Section 2: Design

Chapter 4 by Broadcom

Dynamic Optimization of Tests

Business ●●○○○
Technical ●●●○○

Listen to the audio version

Download the "Section 2: Design" as a PDF

Use the site navigation to visit other sections and download further PDF content

By submitting this form, I understand that my data will be processed by Sogeti as described in the Privacy Notice.*

AI and machine learning techniques enable dynamic optimization of test suites at scale as part of the continuous integration and delivery lifecycle.

Introduction

In an age of Agile and DevOps, agility of software delivery is of paramount importance. In section 1 chapter 1 Quality Engineering and recent trends, we previously discussed how testing as the final activity in the workflow is unquestionably the biggest bottleneck in delivering high-quality code. Sub-optimal testing not only incurs more effort and costs, but also results in lower productivity, release delays with potential negative business impact.
Currently, model-based testing techniques are used to perform static optimization of tests via requirement modeling. However, static optimization ignores dynamic conditions such as changes to specific bodies of code, previous test run history, test flakiness, and other runtime criteria (code churn rate, code quality, number of authors who modified the code, and more).

In fact, except for certain circumstances, tests optimized using model-based testing may be suboptimal – implying that we are not doing everything possible to reduce testing effort and cycle time. On the other hand, trying to make test optimization decisions manually based on a plethora of such data would be prohibitively time consuming and non-scalable.
This chapter discusses how such solutions can be implemented using customer case studies as examples. We demonstrate how we leverage AI/ML to generate dynamically optimized test sets through the use of the following techniques, among others:

  • Establish an association between tests run and code executed. Each time code is modified, we can flag the tests that need to be run. This significantly reduces the number of tests – often by up to 90% – when compared to those conducted without the use of this technique.
  • Analyzing the pattern of previous test results enables the identification of flaky tests, which are flagged as requiring re-run.
  • Establishing correlation between past defects (discovered during testing) and code components in order to identify code that requires rigorous testing.


Broadcom's Continuous Delivery Director (CDD) Test Advisor product incorporates these approaches.

The challenge of testing in an agile development

A recent Google case study revealed an astounding rate of development among 25,000 software engineers:

  • 40,000 code commits/day
  • 50,000 builds/day

We also see comparable data from other organizations. According to a recent case study conducted at Microsoft, the Windows development team made an average of 8500 code pushes per day, resulting in approximately 1760 builds! And Facebook's mobile application receives approximately 1500 code pushes per day.

Now consider what happens in terms of testing as a result of this type of frenetic development activity.

  1. Prior to committing code, a developer performs some local unit testing using a local build. While there are no standards for the number of unit tests that should be run as part of a code commit, data from Facebook indicates that an average code commit contains approximately 200 lines of code in steady state. This entails performing at least 3-5 pre-commit unit tests.
  2. After code is committed, builds are triggered at predefined intervals (for example, one every two to five commits on average), resulting in the execution of build verification/sanity tests, unit tests, component tests, and integration tests, among others. This entails the execution of approximately (or at a minimum) 15 unit tests and 5 integration tests per build. Additionally, we typically run a suite of regression tests, which may number in excess of 500.
  3. Assuming the build passes the tests, Assuming the build passes the tests, it is promoted to downstream environments, such as system testing and pre-prod environments where we execute end-to-end tests, UAT, etc. The test count keeps going up.
  4. Finally, testing is carried out in production using a variety of different testing techniques, including synthetic monitoring, chaos testing, and so on.

To put the volume of testing into context, let's return to the Google example. Their corresponding testing numbers are as follows:

  • 120,000 automated test suites
  • 75 million test cases are executed daily.

Given that the majority of enterprise software engineering organizations are not as large as Google's, we normalize this data on a per-build basis for each developer. This equates to approximately two code builds per developer per day, with an average of 1,500 automated tests per build.

Given that the majority of other enterprises are unlikely to be as mature as Google, let us assume an average of one build per day, resulting in approximately 750 tests per build. Given that the majority of organizations do not use significant test automation, let us assume that 50% of the 750 tests are automated and that manual tests take an average of 5 minutes to execute. This equates to approximately 32 hours of manual testing per build, not to mention the additional effort (and time) required to prepare test data and environments for these tests.

Thus, for a typical agile team of approximately eight people (say, five developers), this equates to more than 160 hours of manual testing per build! This is a significant overhead that could result in release delays or, more likely, testers failing to test adequately, increasing the risk of failure.

This clearly demonstrates how difficult it is for testing to keep up with the development pace in an agile environment. While techniques such as model-based testing would likely help reduce this burden by about 50%, they would require additional modeling effort and may still be insufficient to keep up without fully automating all tests. As a result, we must consider novel ways to address this issue through the use of AI/ML.

AI/ML solutions to the rescue

There are numerous ways in which AI/ML can assist in resolving this issue, including correlating multiple data sets associated with the testing process. All of these contribute to the reduction of test volume to an essential subset based on the various approaches described below.

Figure 1: Reduction of test volume

Let’s look at three different approaches of Broadcom’s Continuous Delivery Director (CDD) Test Advisor solution.

Associating Tests with Code changes

Given the frequency with which code is changed, a critical approach is to determine which tests are impacted when specific code is changed. By instrumenting the code under test, we can establish a link between the tests being run and the code being tested. The AI system is trained (and continues to learn as more tests are executed) by running existing tests against the code base. 

00203 S2C4 Figure 2

 

Thereafter, whenever code is changed (via a commit to the source code management system), Test Advisor flags the tests that are impacted by the change, prioritizing their execution.

 

00203 S2C4 Figure 3

 

Naturally, all new (or updated) tests would be qualified automatically as tests that must be run as part of a build.

Understand pattern of past test results 

An AI system can deduce patterns from previous results to determine which test sets should be prioritized for execution. One of these patterns is what we refer to as flay tests. These are test suites that exhibit one or more of the characteristics listed below:

  1. Test suites that have failed numerous times during previous executions
  2. Test suites that are erratic and alternate between success and failure during subsequent test cycles
  3. Test suites that have failed during the previous execution

If the test suite's results show a discernible trend and the last several test runs are all successful, the test is not identified as flaky. This type of test result indicates that an issue existed but was resolved.
A test can be flaky as a result of a bug in the code or as a result of a flaw in the test itself. A flaky test suite is automatically selected for the next test cycle. 

Mapping defects found to code components

By associating defects discovered during tests with the application components from which they originate, an AI system can identify which builds require additional testing based on the code components included in the build. Naturally, we must prioritize defects appropriately based on their severity, location within the CI/CD lifecycle, and type. This enables the AI system to proactively identify high-risk or defect-prone components and flags out the tests associated with them (as described in feature (A) above) whenever they are included in a build.

In summary, Test Advisor assists in reducing the amount of testing required by flagging tests that must be run as part of a build. This frequently results in a reduction of more than 60% in the number of tests run (in comparison to those run using traditional approaches).

00203 S2C4 Figure 4

 

Case in point

A large telecom service provider has been implementing DevOps practices to transition to an agile delivery lifecycle. Naturally, they were concerned about the volume of tests they needed to run in conjunction with frequent code changes and builds associated with their bi-weekly release cycles. They were typically required to run over 500 tests per build of an application (including unit, component, and BDD tests, API and integration tests, and system tests), the majority of which were manual. Carrying out all of these tests would significantly slow their release cycle. They were able to reduce the volume of tests by 70% to 90% depending on the application by implementing CDD and Test Advisor. By reducing the number of tests, they were able to significantly reduce elapsed time and testing-related delays.

Looking forward

We have only begun to scratch the surface of AI/ML applications in testing and quality – and a vast realm of possibilities awaits. Several key use cases that we are considering for future enhancements include defect prediction and source identification, quantification of release/deployment risk prior to going live, and automated test (and test data) generation based on production usage scenarios.

About the author

Shamim Ahmed

Shamim Ahmed

Shamim Ahmed is CTO for DevOps Solution Engineering at Broadcom Technologies, where he is responsible for innovating intelligent DevOps and BizOps solutions using Broadcom's industry leading technologies.

His key focus areas include: applications of AI/machine learning to DevOps, "model-based everything" in Continuous Delivery (specifically model-based requirements, testing, deployment, release, test data and services) , site reliability engineering (SRE), and modern application architectures.

About Broadcom

Broadcom Inc. is a global technology leader that designs, develops and supplies a broad range of semiconductor and infrastructure software solutions. Broadcom’s category-leading product portfolio serves critical markets including data center, networking, enterprise software, broadband, wireless, storage and industrial. Our solutions include data center networking and storage, enterprise, mainframe and cyber security software focused on automation, monitoring and security, smartphone components, telecoms and factory automation. For more information, go to www.broadcom.com. The term “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.

 

 

CA-Broadcom_Horizontal_red-black.png