This case study is comparable to hundreds that have been published recently. This corporation handles several billion dollars annually of sales in the automotive space. They have a mature consumer facing website that contains catalog, search and purchasing functionality along with other information.
They have hundreds of automated selenium scripts as well as manual testers. They deployed AIQ’s autonomous testing to augment their existing test activities.
Setup (AI Training) for this web application took several hours. Training is accomplished by the autonomous engine walking through the application a few pages/states at a time and surfacing actions which users might take. These includes clicks, forms, dropdowns etc. As these are surfaced to the trainer in the AI workbench, the trainer can choose how the system will handle each action. In most cases, we leave them to be acted upon as a human would. However, for certain actions, we may choose to limit interaction to never, or once per application or perhaps once per page. We mapped the fields to CSV data or to generated data. We did not want to test all 4 million inventoried items in this ecommerce application, so we specifically limited anything that appeared to be a new inventory possibility to only one time for the application. This allowed us to focus on the distinct pages and states, rather than repeatedly presenting the same inventory presentation page.
Additionally, AI Training considers validations, which refer to the fact that certain specific flows or input values which result in a particular outcome. This is a more abstract term than "asserts," as it applies throughout an application and through dozens, if not hundreds, of user flows. This is in direct opposition to the way we were taught to keep the number of user flows we test to a minimum and to test assertions only once. Here we have an extremely fast machine that is capable of writing and checking results 100,000 times faster than humans. Rather than reducing or limiting testing, we now want to prioritize maximal application and validation coverage.
When the baseline setup was complete, we were ready to execute a new AI blueprint on the next build. The blueprint was successfully kicked off and finished 42 minutes later finding 234 unique pages and hundreds of actions, writing, and executing several hundred scripts. The autonomous system follows a course as a human would. It accomplishes this by making educated guesses about every component, from interactions with thousands of elements to page state judgments to match scoring. Rather of relying on a single machine learning algorithm, we use 19 various machine learning algorithms, each of which attempts to solve small problems depending on past information about the application or applications in general. Bayesian curves, predator prey, recursive error reduction, decision trees and other traditional algorithms are employed. In this case, the system was asked to deploy up to 100 threads (bots) in parallel, each integrating with the other to not overlap user flows.
Figure: Example of a running AI Blueprint
The results were in after the first blueprint was completed. 110 new bugs were identified. These UI and API bugs were not found in the teams traditional automated and manual testing. The bugs consisted of malformed API requests, API non-responses, 4xx and 5xx responses as well as some very slow responses. Each of these had a negative impact on the user experience in some way but had not been found through conventional techniques. The team’s responses were typical:
“These bugs aren’t real (it’s a machine…it doesn’t post false positives).”
“These bugs didn’t matter." Ah, but they did as you see.
“And that one was an impossibility”.
“We would have found it before.”
Figure: Example of AI Blueprint Dashboard
Each new blueprint will add data to the baseline (learning at each build) and set itself up for success for the next build. And if accessors change, it adapts to those changes (no maintenance required) as well as changes in user flow, reporting variances to a QA team as desired. As the team dug into the results, they were able to reproduce each error on their own. AIQ hands them the scripts which it wrote that caused each bug. They can be run in the IDE right on a developer’s desktop. Each bug was verified to be genuine, and each was addressed and resolved within weeks.