État de l'Intelligence Artificielle appliquée à l'Ingénierie de la Qualité 2021-2022
Section 8: Operate

Chapter 3 by Smartesting

From Logs to Tests: Usage-driven Regression Testing

Business ●●○○○
Technical ●●●○○

Listen to the audio version

Download the "Section 8: Operate" as a PDF

Use the site navigation to visit other sections and download further PDF content


By submitting this form, I understand that my data will be processed by Sogeti as described in the Privacy Policy.

Using machine learning-driven usage analysis, you can ensure long-term quality in production by automating the maintenance of your regression tests.

The innovation presented in this chapter focuses on a critical yet challenging activity for any DevOps team: continuous regression test update.

Our customers frequently summarize the issue as follows:

  1. automated regression test coverage is a "black box", we have no control over it.
  2. as time passes and changes are made, the impact of automated regression tests diminishes, while maintenance costs increase.

As an illustration, consider the situation in a large BFSI company's digital factory. Automated tests have been the focus of considerable effort at the exposed API and call flow level of the microservices. A team of five test automation engineers automates and maintains over 650 functional regression tests. Recently, an increasing number of anomalies have evaded these tests. Coverage, which was initially determined by acceptance criteria, is no longer regulated. Utilizing usage analysis during operation was the optimal strategy for significantly increasing the impact of this regression test suite.

The pain points with functional regression testing

Pain point 1: loss of relevance

Automated regression tests tend to lose relevance over time. Initially derived from the analysis of business cases, these tests gradually lose their relevance over time. Because the devil is in the details, sequences of business actions that were worth testing six months ago have lost relevance in light of our customers' actual use of our product. Of course, there is also the issue of relevance loss as a result of the well-known "pesticide paradox" (basic knowledge in software testing - see ISTQB Certified Tester at foundation level). Due to the tests being run on the same routes every time, their ability to detect anomalies goes steadily down. This gradual decline in relevance results in a significant loss of value for the Agile team, with automated regression tests providing ever-poorer feedback. 

Pain point 2: obscure coverage

How comprehensive is our functional regression test suite's coverage? Is there anyone in the team who is aware of the level of coverage provided by our regression test suite? Sadly, the answer to both these questions is frequently No. As one Test Manager from a large insurance company put it: "Automated functional regression tests are something of a black box, and we rarely investigate what they do and cover. It has developed into a significant issue in our team, with doubts about the true value of these tests".

Pain point 3: a disproportionate maintenance effort

This directly results from pain point number two. Due to the team's lack of visibility into test coverage, it's quite difficult to determine where to focus efforts to increase the relevance of automated functional regression tests. Why isn't the team optimizing these tests on a continuous basis to increase their relevance and value? Frequently, the response is (ask your squad): “we don't have the time, and we're unsure where to focus our efforts because we lack visibility into the coverage of critical customer journeys”. The team fixes failed tests (often for technical reasons), but this is typically where the effort ends.

Analyze usage coverage to optimize regression testing

All improvement and problem solving begin with knowledge. Being able to quantify test coverage in relation to our product's most frequent usage is critical for optimizing tests and focusing the team's efforts on the value delivered in operation.

Figure below shows usage and test traces analyzed from the logs in the Gravity tool. Logs in operation (in purple) and logs of test sessions on the test environment (in pink). The overlay on the coverage of user actions is extremely informative to identify missing tests and updating the test suite for the purpose of improving usage coverage.

Figure: Visualization of usage coverage by a regression test suite

Figure: Visualization of usage coverage by a regression test suite


Thus, analyzing production and test logs enables us to optimize our regression tests by:

  • exploring usage traces and analyzing situations (for example, anomalies in production)
  • identifying usage traces to be covered
  • generating the necessary automated scripts

Test automation engineers/developers benefit from coverage analysis, while product owners and business analysts benefit from visualizing customer journeys and associated data.

From logs to tests: main challenges

This chapter discusses the various challenges and solutions associated with the creation and maintenance of regression tests leveraging production logs.

Monitoring tools such as Datadog, Splunk, Dynatrace, Application Insights, or Kibana are typically used to manage logs. It is the data source: first, we must ensure that the logs contain sufficient detail regarding user actions (for UI tests) or API or microservice calls (for API tests). Since the artefacts we target are tests (functional regression tests), the logs must capture all actions/calls along with their associated data, allowing for the analysis of complete usage traces.

The following paragraphs detail the 4 main challenges, ranging from data preparation to automated test script generation, within a dedicated framework. Postman is a popular test automation framework for API testing. Numerous frameworks, such as RobotFramework, Cucumber, and others, are currently used for UI testing.

Challenge 1 - From log events to traces

A little terminology is required at this point. A log file is a chronological record of events that occurred during an execution. A "usage trace" is a series of events that represent a functionally consistent usage session.

In this example, Figure to the left depicts an extract of a log file in JSON format. The processing scheme described in Figure to the right is used to differentiate usage traces from logged events. This is the first instance of automated processing in which machine learning assists us in determining the coherent chaining patterns necessary to compute the traces.

Figure: Log example


Challenge 2 - Explore and analyze customer traces

The second challenge is how to assist the Agile team in exploring and analyzing the traces. Numerous analyses of this data may be of value:

  • Clustering the traces generates coherent groupings that aid in trace analysis. On the left, in the figure below illustrates the application of hierarchical clustering to organize the traces.
  • The workflow visualization of the traces provides an overview that can be zoomed in to focus on a specific subset of the call graph. The workflow for a selection of usage traces is shown in the figure below, top right.
  • Each event in a trace should have its associated data available for browsing and analysis. At the bottom right ofthe figure below, a tree view of the data is shown.
Figure: Visualization of traces and related data

Figure: Visualization of traces and related data


The exploration and analysis of the traces will enable the team, particularly the testers, to dig into an anomalous situation and identify previously unknown user flows (that would be valuable to test in regression because they correspond to customer behavior).

Finally, visualizing the frequency of use of specific user flows enables you to prioritize the execution of tests and also evaluate the impact of an anomaly on these flows.

Challenge 3 - Identify missing/unnecessary tests and optimize coverage

We can calculate test traces from the logs generated during test execution in a testing environment. This enables the determination and visualization of the coverage provided by the current test runs. This coverage is depicted in the figure below, with purple indicating user flows that are covered by one or more tests and pink indicating flows that are not covered.

Testers can then decide whether to complement (or not) existing test coverage, identify unnecessary tests, and optimize coverage through regression test suite updates. Iterative processes are used to achieve the desired level of usage coverage.

The figure below depicts the results of the coverage analysis in two distinct formats (list format and graphical workflow format). This is accomplished by comparing operational usage traces to test traces generated during test execution. Existing tests do not cover the pinked user action sequences.

Figure: Visualization of usage coverage by the tests.

Figure: Visualization of usage coverage by the tests.

Challenge 4 - Produce and maintain test scripts in your favorite framework

Finally, the final challenge is to generate executable automated test scripts based on the usage traces that must be covered. This is accomplished by automatically generating scripts in the format required by a particular test framework:

  • A mapping between the steps of the traces and the REST API calls. This step is facilitated by importing an OpenAPI specification file (in Swagger for example), if it is available.
  • Automated generation of scripts, based on keywords or API calls.

The role of Machine Learning in the “logs to tests” process, and input data requirements

The process can finally be summarized in main groups of activities that are illustrated in the figure below.

Figure: The process in 4 activity groups "from logs to tests”

Figure: The process in 4 activity groups "from logs to tests”


The Gravity tool's support and automation of these activities are based on two types of technologies, namely Machine Learning and Model-Based Testing, which together comprise Smartesting's core competencies. Let's take a closer look at these technologies, paying particular attention to the quality of input data required by machine learning and automated test generation algorithms.

  1. Data acquisition & usage analysis - The input data consists of production logs. The primary issue that occasionally arises is a possible lack of precision in the calls (regarding data values, for example). By and large, this data is of high quality. The data comes from commercially available application monitoring tools such as Datadog, Splunk, Dynatrace, or Kibana. Integration with such products for the purpose of extracting logs is relatively simple. Clustering algorithms such as KMeans and hierarchical clustering are primarily used, as they are unsupervised algorithms capable of detecting usage patterns and traces within large sequences of log events.

  2. Visualize & complete test coverage - The algorithms for visualizing traces as graphical workflows (sequences of actions, API calls, and microservices) are derived from Model-Based Testing and are based on abstraction calculations and action/call factorization. These techniques are scalable, meaning they can process thousands of traces simultaneously. Tests can be completed by selecting the traces to be covered. Additionally, we are experimenting with Deep Learning algorithms derived from NLP (Natural Language Processing), specifically attention network algorithms (such as "transformers").

  3. Test generation (scripts, keywords, data) - Additionally, the technologies used in this case are based on Model-Based Testing. It performs filtering and mapping operations on existing scripts, keywords (or Postman collections for API testing), and trace steps. This enables the production of test automation artifacts to be automated to a degree of greater than 80%. In the case of API tests, Swagger specifications provide valuable input for further automating the process (if these specs are detailed enough).

  4. Test execution prioritization - Prioritization is performed using machine learning algorithms that combine reinforcement learning and decision tree learning. The learning phase and prioritization calculation are informed by (1) the history of execution results, (2) test characteristics (complexity, date of last update, execution time, etc. ), and (3) characteristics of the changes made to the software during a build (modified package, granularity of change).

“From logs to tests” – Experience report

We've experimented with Gravity's "from logs to tests" approach in a variety of contexts (SaaS software vendors, large company digital factories, and eCommerce companies), both for regression API testing and UI testing.

The lessons learned focused first on the roles involved and their use cases:

  • POs / BAs and IT Architects explore traces and usage data, analyze anomalies, and provide API Specs files if required.
  • Testers and test managers analyze and complete the test coverage
  • Test automation engineers and developers exploit the product automation artifacts and check the context of anomalies for correction.

The benefits cited are primarily related to the impact of visualizing the traces and associated data for anomaly analysis. Numerous Gravity users have reported saving more than 70% of the time required to analyze a call flow. The second effect is on the way tests measure usage coverage: this becomes the permanent guide and focal point for regression test maintenance. Due to a significant lack of visibility regarding this coverage, the team has made regression test coverage a key performance indicator (for example, 100 percent coverage of key customer journeys). Finally, the generation of various test artifacts (scripts, data, keywords/queries) eliminates more than half of the effort required to maintain these tests.

The only requirement for implementing Gravity's "From logs to test" approach is access to production and test logs. In an ideal world, logs would be managed in one of the COTS monitoring tools mentioned previously, for which we already have connectors. Additionally, in some instances, it is possible to develop a custom connector. More importantly, as mentioned previously, the level of detail in the logs must be sufficient; otherwise, their configuration will need to be adjusted.

From logs to tests: on the way to autonomous testing

Testing from usage has two benefits for Agile and DevOps teams: it ensures that regression tests adequately cover usage and simplifies/accelerates test maintenance.

Additionally, the various activities of usage trace analysis and coverage completion help to strengthen collaboration between the team's various roles, including the operations engineers. This expedites the analysis of production-related anomalies and optimizes the correction cycle.

The use of machine learning algorithms in conjunction with techniques for automatic test generation from Model-Based Testing enables the production and maintenance of regression tests to become increasingly automated.

This paves the way for the development of an intelligent and autonomous regression testing robot that will simplify life for digital teams in the future.

About the author

Bruno Legeard

Bruno Legeard

For over 20 years, Bruno’s brain has been intensely working on software testing, how it is practiced, its challenges, and the methods and technologies that foster its progress (specially by associating graphic representations to text). 

Bruno’s brain is fascinating: a PhD in Information Technology, a professor and researcher at the University of Franche-Comté, world’s leading expert in Model-Based Testing, and heavily involved in implementing AI in testing. 

Bruno shares his expertise with the testing community thanks to his roles with entities such as CFTL (French Software Testing Board) and ISTQB. From these control towers, he also observes market trends and identifies promising innovations.

As a consulting scientist at Smartesting, he advises the company on his choice of technology, and guides the product vision.

About Smartesting

Situated in the heart of Franche-Comté region (known for cancoillotte cheese and also birthplace of Victor Hugo), Smartesting enjoys a reputation of being a leading expert in Model-Based Testing worldwide. This innovative method for software test design uses smart algorithms to automatically generate tests from graphical representations of the application to be validated. 

Smartesting launched Yest® in 2017. A visual and Agile test design tool meant for professionals who carry out functional validation of IT systems. Yest® is the fruit of the Smartesting team’s unique know-how, experience as well as their unrelenting determination to offer the IT testing community a simple tool with which testers can be more efficient, Agile and... happy! 

With the help of a European research laboratory, Femto-St Institute, which prepares the ground on the latest innovative technologies, and by listening attentively to its customers and partners, Smartesting creates innovative tools to help testers in their work and to industrialize test processes. 

Are you looking to have higher levels of automation, industrialization and efficiency in testing?  

Visit us at www.smartesting.com