State of AI applied to Quality Engineering 2021-22
Section 4.1: Automate & See

Chapter 3 by Micro Focus

Lower test automation barriers with computer vision

Business ●●●○○
Technical ●●○○○

Listen to the audio version

Download the "Section 4.1: Automate & See" as a PDF

Use the site navigation to visit other sections and download further PDF content

By submitting this form, I understand that my data will be processed by Sogeti as described in the Privacy Policy.*

AI-based test automation significantly decreases the time required to design, build and maintain tests because objects are identified simply by looking at them. Additionally, teams can utilize the same test without change across multiple devices and platforms.

The surge in digital transformation efforts is pushing organizations to perform tests on an increasing number of platforms and get the results as quickly as possible. Manual testing is not fast enough, so teams turn to test automation. But traditional software test automation techniques are only effective up to a point. They rely on identifying objects on the screen through their internal representation, such as coordinates, class name, type, specific position within a web page’s HTML, or many other possible options. This method of identifying objects can be very fragile. Even a small change in the object’s description might result in the test execution engine failing to find the object. The drawbacks of these techniques prevent teams from scaling their test automation up to the levels they require.

The most common test automation challenges include:

  • Test creation barriers—It takes time to build and design effective tests, with much effort required to uniquely identify on-screen objects that are part of the test.
  • Acquiring the right skills—Not everyone on the team has the skills needed to create and maintain effective and reusable automated tests.
  • Insufficient test coverage—Teams must support an ever-expanding range of platforms, devices, and operating systems, requiring testers to customize the tests for each environment.
  • Relentless test maintenance—A minor change in the UI can cause tests to fail, even if the test’s flow isn’t affected. Tests that rely on unique object properties can be susceptible to this type of change, which happens frequently. Therefore, testers must update the tests and ensure that they still run on each environment to be supported.
  • Test execution time is too long—Even if a test set runs without interruption, it can take a significant amount of time to run all of the tests to completion.

Our research at Micro Focus revealed that automated object detection is key to reducing these barriers—and even removing some of them completely. If a human doesn’t need to know how a UI object is represented internally, developing a similarly agnostic machine-driven approach to object identification should be possible.

AI Techniques for Automated Object Detection

Our objective was to develop an AI engine that recognizes objects without knowledge of their internal representation. This goal was accomplished by combining AI-based algorithms that accurately and consistently recognize objects regardless of device or environment.


Figure: UFT One using computer vision to identify objects while recording a test


Figure: UFT One using computer vision to identify objects while recording a test


For example, a test step might require clicking the shopping cart icon on a mobile app. The AI engine should be able to locate the shopping cart icon on the current screen without knowing:

  • If the screen is on a mobile device.
  • Whether the device is running Android or iOS.
  • If the screen is a desktop browser.
  • Whether it’s Chrome, Firefox, Edge, or another browser.

In short, the “Click the shopping cart” step should work under any circumstance with the AI engine.

This capability has been successfully implemented in the Micro Focus UFT Family of software test automation products. The AI techniques used in the engine include computer vision through an artificial neural network and optical character recognition.

Computer vision

Computer vision (CV) is the ability to extract information from images. Our AI engine understands a screen’s composition and breaks it down into the unique objects that it contains.

In terms of architecture, we implemented the AI engine as a separate module. Rather than restricting it to a specific product, any of our testing products can theoretically use the engine. The first product that integrates with the AI engine is Micro Focus UFT One, our flagship test automation tool.

A UFT One GUI test consists of a test script that contains several test steps. A step is a statement that performs some action on the application under test (AUT). A pseudo-code example might be “Click the shopping cart” or “Enter 2 in the ‘How many items’ textbox.” Note that while a typical test script includes other types of steps such as calculations, or application logic, this discussion pertains to steps involving objects on the screen, such as manipulating them, or performing checkpoints or validations on them.

When UFT One’s test execution engine encounters a step with an object, it presents the object to the AI engine, which invokes its CV algorithm to look for the object on the screen. If it finds the object, the AI engine provides metadata about its location back to UFT One’s test execution engine. It then performs the action required on the object: clicking it, entering text, selecting it, etc.

Most importantly, the AI engine knows nothing about the implementation of the object. It treats the object as an image, regardless of the device or platform it comes from.

The CV capability is supported by an artificial neural network (ANN), a layered structure of algorithms that classify objects. We train our ANN with a large number of visual objects, resulting in a model that identifies objects it will likely encounter in applications under test (AUT). Thus, when the AI engine is tasked with locating a specific object, it utilizes the model to identify a match in the AUT.

Optical character recognition

Optical character recognition (OCR) identifies text stored as an image—or as part of an image—and converts it to computer-encoded text. Our AI engine leverages OCR to identify text-based objects. These objects may themselves be part of the test, or they could function as a hint to identify the object’s relative location. This capability is useful if an object appears multiple times on a screen. For example, a login screen might have two text boxes, one for the username and one for the password. OCR helps identify which of the edit boxes is which. OCR can also identify a button by its textual caption.

Training Data

An AI engine depends on a finely tuned model to accurately determine what an object represents. However, by its nature, AI is not perfectly accurate. While we continuously work to increase its precision, it may nevertheless occasionally misidentify an object. Therefore, we provide a mechanism whereby the user can augment our training data set by sending us anonymized information about an unrecognized object. We use this information to retrain the model, increasing the probability of recognizing the object when encountered.

This mechanism allows us to continuously learn from production scenarios and improve the model that underpins our AI engine.

The Importance of Decoupling

We designed our AI engine independently from UFT One because it is a service that multiple clients can potentially use. Other advantages of decoupling the AI engine enable us to:

  • Issue updates of the model independently of the product that uses it.
  • Offload computation to a dedicated high-powered machine to increase calculation speed.

Updating the model

We continuously refine the model by adding new objects to it and fine-tuning how it classifies the identified objects. By decoupling the model from the AI engine, we can provide our users with updates to the model alone. Users can apply these updates at their convenience without having to wait for a new product release.

Offloading computation

It’s no secret that AI can involve heavy, time-consuming computation, slowing down testing. This issue is anathema to what we are solving with our AI engine. To overcome slowdowns, we have applied several techniques to increase the execution speed of AI-based tests. For example, caching mechanisms that recognize the results of previous calculations (without storing the actual image) avoids repeating the calculations.

But we also decoupled the computation mechanism itself. As a result, users can install the engine on a powerful, dedicated computer. Computers with graphics processing units (GPUs) are particularly efficient for AI-based calculations. Using the engine, a user can take advantage of the GPU’s dedicated power as a remote AI service.

A single high-powered computation machine can serve multiple test execution machines. It is not necessary to couple each test execution machine to a dedicated computation machine.

Using Mockups for Testing

Traditional test automation methods that rely on object properties require that the AUT is present and partially functioning. Consequently, the team must wait for the AUT to start writing scripts.

Due to the fact that our AI engine is agnostic about the AUT's implementation, the AUT is not required to be present. The AI engine can identify objects based on a screenshot of an application. It can even identify objects based on a hand-drawn mockup of a screen if the objects are drawn reasonably accurately. A more conventional application of this capability is to identify objects based on a graphic designer’s mockup.


Figure: UFT One identifying objects from a mockup screenshot


Figure: UFT One identifying objects from a mockup screenshot


With this capability, test designers get a head start by using a graphical representation of the UI in a common format, such as JPG or PNG. They can begin scripting before the developers even start working on the implementation. The objects the AI engine identifies are highlighted on the image. The tester can simply drag and drop these highlighted objects into the test script. The objects are converted to their appropriate AI-based representation, and their default actions are triggered.

Using mockups enables testers to create a working script they can execute as soon as the AUT becomes available. Ultimately, all teams save time by developing the test scripts and the AUT independently and in parallel.

Measurable Benefits of AI-Based Testing

Since we unveiled UFT One’s AI-based testing capabilities in 2019, customers have enthusiastically adopted them and seen encouraging results.

A large health maintenance organization in the United States introduced a mobile app for customers to access and manage their services. The testing team had to support Android and iOS implementations while keeping up with the fast pace of change during development. At that point, the testing philosophy involved identifying objects through object locators. But minor changes, such as changing the XPath of an object, required significant and constant test upkeep. The team was not able to meet their deadlines.

The testing team was already using UFT One and decided to adopt its AI-based testing capabilities. Creating automated tests with AI helped team members focus more on testing the flow and business requirements instead of identifying objects reliably.

As a result of the AI-based testing, team members could use a single set of scripts that would run unmodified on iOS and Android. They were also able to reduce mobile test maintenance by at least 35%.

Furthermore, the team took advantage of creating tests based on mockups before the app’s layout was complete. The tests were ready to execute as soon as the code became available. Testers avoided failures that they would have otherwise encountered had the tests been written using fragile object properties and locators. Thus, testers also avoided the significant maintenance required when failures occur.

As well as significantly reducing maintenance and increasing test reliability, the team also found that tests were quicker to write. Faster test creation saved hours of work that testers would have spent on understanding the underlying implementation of an object. Interacting with the object purely from its visual representation provided significant benefits.

Maturity of Test Automation Based on Computer Vision

When we rolled out our new computer vision-based testing capabilities, we worked closely with our early adopter customers. We wanted to learn how they were using the technology and to ensure that clients were receiving the greatest value from it. This collaboration helped us discover new use cases and increase the technology’s reach and coverage.

As with any nascent technology, customers reported multiple edge-case scenarios that helped us improve our implementation and refine our model. Over time, we resolved more scenarios and grew the data set we use to train our model. Our algorithms became more capable and reliable. Due to these improvements, these customers now position AI as the preferred test automation mechanism when possible.

Lowering and Removing Test Automation Barriers

AI-based test automation reduces the time it takes to build and design tests because objects are identified simply by looking at them. AI algorithms lower skill barriers because they identify most objects and are hidden from the user. Teams can also use the same test without modification on different devices and platforms. They simply procure an appropriate device and run the test on it as-is. And because the algorithm doesn’t rely on an object’s underlying implementation and properties, the test keeps running even if there is a change. As long as the test’s flow stays the same, the test will continue to run.

The final barrier yet to be removed completely is test execution time. Tests will always take a finite time to run; hence there is a lower limit on the amount of time they take. However, AI-based testing helps teams test earlier and provides robust mechanisms that parallelize and optimize test execution, reducing the wait time for results.

About the author

Malcolm Isaacs

Malcolm Isaacs

Malcolm Isaacs is Senior Product Manager of Functional Testing in Micro Focus’ Application Delivery Management (ADM) product group. During the course of his career, Malcolm has held various positions, including software engineer, team leader, and architect. He has been with the Micro Focus family since 2003, when he joined Mercury, where he worked on a number of products specializing in supporting traditional and agile software development life cycles, from planning through deployment.

About Micro Focus

Micro Focus delivers a broad portfolio of enterprise software that help bridge the gap between existing and emerging technologies so our 40,000 customers worldwide can run and transform at the same time.  To build and deliver better software faster, you need a “Quality everywhere” culture. Our continuous quality and security solutions help you make this cultural shift—testing web, mobile, and enterprise applications to deliver high-quality experiences to keep, grow and expand your business.

Visit us at