State of AI applied to Quality Engineering 2021-22
Section 4.1: Automate & See

Chapter 5 by

Transforming gaming with AI-driven automation

Business ●●●○○
Technical ●●○○○

Listen to the audio version

Download the "Section 4.1: Automate & See" as a PDF

Use the site navigation to visit other sections and download further PDF content

By submitting this form, I understand that my data will be processed by Sogeti as described in the Privacy Policy.*

If AI can play games, why not have AI test them too? This chapter takes you on a journey through the bleeding edge of AI for game testing.

Over the years, one of the most popular applications of AI has been in the area of gaming. AI has not only beat grandmasters and world champions in classic games like Chess and Go, but has achieved expert levels in modern, cooperative, multiplayer video games like Dota 2. So if AI can play games, why not have AI test them too? This chapter takes you on a journey through the bleeding edge of AI for game testing. It describes the challenges in testing modern video games, and explains how advances in AI and machine learning (ML) are helping to overcome those challenges. It covers examples of AI-driven automation approaches for testing the functionality, look and feel, streaming experience, and gameplay mechanics of modern video games. Through two industrial case studies, the chapter discusses some of the real-world challenges and practical benefits of this technology. The chapter concludes by providing key insights on the future of AI for game testing.

All images in this chapter are provided by

Playing Isn’t Testing


Players control


I used to think that testing video games would be a dream job. Doesn’t it seem like it would be awesome to get paid to test the newest video games and gaming add-ons? However, there’s a big difference between testing video games and playing them – the driving motivation. You’re motivated to play games because you like the achievement. You want to complete a given challenge or advance in the game to see what happens in the next level. Sometimes, it’s the rules or mechanics of the game that you find interesting, captivating, or downright addicting.

You may also play games for the social aspects. You want to interact and connect with other players, whether it be by cooperating with them or by challenging them. Whether individually or socially, you play games to be immersed. By drawing you into a unique story experience, the hard work of game designers and developers allows you to escape from real life. After all, these have not been normal times. Everyone has been dealing with the global coronavirus pandemic and, for the most part, they’ve been stuck at home. Because of this, there has been a growing appreciation for gaming products and services.

Testing games is much more than just playing them. The underlying motivation and approach tends to be completely different. Okay, perhaps to some degree you’ll still want to test and make sure the game is enjoyable, but that type of validation typically happens much earlier in the game development process, for example, during ideation, design, and storyboarding. The evaluation of the game after it has been coded has a different focus. Game testers seek to answer several questions about the quality of the implemented game. Is the gaming application functional? Are there cosmetic bugs and glitches that take away from the user experience? Is it possible to cause strange or even bad things to happen during gameplay? These are but a few of the concerns that have to be addressed daily during video game testing.

Game Testing Is Hard But Valuable

Game testing is an interesting and difficult problem to tackle, and one that many in the testing community underestimate. In comparison with enterprise applications, games are truly vast. An enterprise application might have 20 screens that each go 10 levels deep, and have 10 fields on each screen. All of that is countable in that it can be mapped to the set of natural numbers. Although large, it is generally well-bounded and well-defined. Not so with games. In a video game, at any moment a player can choose to go on a new mission, equip different sets of items, or interact with other human or AI-based players. The game content itself is constantly evolving, thereby making the game an ever-moving target from an engineering standpoint. Now imagine trying to cover that kind of space with tests, whether they be functional regressions, visual checks of content, or even newly added behaviors. In this context, creating, and executing, and maintaining those types of tests at scale can quickly become a resource-intensive, repetitive undertaking. 

To make a hard problem even harder, there is generally little to no support for end-to-end automation for video games. As a result, all of the various dimensions of game testing are defined as manual tasks.  Many gaming companies end up outsourcing testing activities so they don’t have to deal with addressing these challenges upfront, but this just shifts the pain to the right and causes quality issues to arise in production—requiring even more time, effort, and money to resolve.


Figure: Gamers statistics


There are approximately 2.5 billion gamers worldwide, many of whom are purchasing games, subscriptions, and downloadable content. But gamers tend to be finicky and fickle users. They can be impatient and quick to move onto the next game, or worse yet, they may find something else to do if they become overly discontent or frustrated with gaming. When gamers walk away, the revenue loss is substantial. Game testing seeks to prevent this kind of revenue loss in what is currently a booming industry.  To put the value of the gaming industry into perspective, video game sales grew by 20 percent in 2020, and the global market is expected to be worth around $250 billion by 2025. With that much money at stake, gaming companies are really starting to pay attention to software quality, testing, and automation.

Leveraging AI for Game Testing

It’s time to dive into the main topic by describing work that is being done to apply AI to testing modern video games. Specifically, in this section I’ll cover the use of AI for build verification, scenario testing, asset verification, along with game streaming and gameplay testing.


Figure: AI for Game testing

AI for Build Verification and Scenario Testing

A system-level build verification test (BVT) is a very basic test that helps you to determine whether a new version of an application is stable enough for further testing.  In gaming, an example BVT may involve launching the game, navigating through the menus, starting a new game session, and verifying that the player can interact with the game environment. This type of automation uses computer vision to identify and interact with the game menu elements and other widgets. With scenario testing, the idea is similar but would involve testing deeper application flows, functionality, and feature interactions.

The way this works is that autonomous and intelligent agents, commonly referred to as bots, perform functional testing of the game through a process of Explore, Model, Interact, and Learn.

  • Explore. The bots are trained on hundreds of images and/or videos to perceive the game application and environment.
  • Model. The bots create abstractions to keep track of the application state and consult those models when dealing with uncertainty, partial observability, and for test generation.  
  • Interact. The bots generate actions to test the application through the user interface, including applying test inputs and verifying observable outcomes.  
  • Learn. Humans can give the bots direct feedback on the quality of their data and actions.  Bots also learn by combining trial-and-error with a reward system—a process known as reinforcement learning.



Figure: AI for Build Verification and Scenario Testing


Figure: AI for Build Verification and Scenario Testing

AI for Asset Verification and Game Store Testing

Modern video games are highly visual and now, thanks to AI, test automation can handle validating these types of graphic-intensive applications. Visual gaming assets tend to be animated, constantly changing position and orientation on the screen, and are frequently accompanied by different types of lighting and artistic effects.  Computer vision based solutions work well to locate and classify objects under these types of conditions. You can train bots using images captured from a stream or recording of the animated object and, just like humans, the machines are able to recognize the object from any angle or with various lighting and special effects applied.  Computer vision makes it possible to identify dynamically rendered items in a video game—a task which foils traditional automation approaches.

Visual asset verification is particularly useful for testing in-game stores that allow players to purchase add-ons and downloadable content. These stores represent a direct revenue stream for gaming companies.  Let’s take a quick look at how bots verify visual assets to make sure that gamers aren’t surprised by cosmetic bugs. When testing visual assets in games, bots use two popular machine learning (ML) based object recognition techniques: localization and classification.

  • Object Localization. During object localization, the bots identify the different objects that appear on a given screen. Common practice is to identify all the objects of interest and draw bounding boxes around them. If necessary, each of the objects encapsulated by a bounding box can be cropped out into its own image for training purposes.
  • Object Classification. This is the process by which images of individual objects are labeled and then fed to an ML algorithm as training data.  Part of the dataset may be purposely excluded from training, and instead used to test how well the model is at predicting the label or class for an object that it was not previously trained on. 

Now armed with classifiers for various gaming assets, you can write a test that navigates to the game store, purchases and equips specific combinations of items, and then have the bots verify that what it sees is what it expects.


Figure: Asset Verification: Visual Classification


The value added by the bots does not end there. Another key aspect of this technology involves teaching the bots how to detect visual differences — a process referred to as visual diffing.

  • Visual Diffing. The bots are trained on a baseline visual, allowing them to recognize a prior observation as something they would expect to see. During test execution, for example on a new release, the bots compare their test observations with the baseline expectations. If there are any differences, the bots generate an image mask, highlighting pixel  by pixel what changed, so that developers and testers can pinpoint the cosmetic bug.




Complementary to visual game asset verification, AI is capable of testing the audio assets of a game. One approach to this is that, instead of trying to detect differences, audio asset verification with AI can look for similarities between a baseline sound effect and one observed during testing. The general idea is that, accounting for any differences in amplitude and time, the model computes the cross-correlation between the two signals by keeping one signal fixed, and determining if the other signal sweeps through it.  




Now that I’ve walked you through two asset verification approaches, one for visual assets and the other for audio, it should be clear as to how this technology is being applied to game testing today.  Although at times it feels like magic, AI and subfields like computer vision are rooted in mathematics, computation, logic, and other disciplines.  So as I like to say: “AI isn’t magic, it’s just science, technology, engineering, and maybe—a little bit of math!”.

AI for Game Stream Testing

Figure: Gaming devices


Gaming has been pervasive on personal computers, game consoles, and, most recently, mobile devices.  The mobile market for games is huge because of the low barrier to entry.  Everyone these days has a smartphone, and with game streaming they can just pick up their devices and almost instantly start gaming. But what happens from a user’s standpoint when we stream games? How does the performance and overall user experience compare with running games locally? The answer lies with a combination of load and stress testing on the game streaming servers, and observing what happens with game visuals and timings. The good news is that the bots are good at tackling this problem at scale.  Here’s how it works in practice:

  • Simulate or Emulate Load. As with most end-to-end performance tests, it starts with generating load on the server or data center in the form of concurrent users or processes.  When it comes to streaming games, you need to consider the location of the servers and data centers, and who they’ll be serving.
  • Bots Run Performance Test Scenarios. With realistic load being applied to the servers and data centers, the bots can now attempt to run various game streaming tasks via the streaming application running on a PC, game console, or mobile device. 
  • Bots Provide Test Insights and Analytics. While testing at scale, the bots excel at coalescing test results using ML to produce useful performance insights and analytics. This may involve applying a number of ML classification techniques for detecting game stream bugs, and leveraging ML clustering on the raw test logs.

Although there are several useful analytics the bots produce during game stream server and data center testing, by far one of my favorites are reports around game launch success rates and timing analysis.  As the bots perform their tests, they capture performance metrics on each test action. For game stream launches, you get to see success rates by data center, to see if there is a problem for a given region or set of users, or you can view them by game title, in case the cause of the issue is a problematic game.


Figure: Game Stream Testing


With respect to timing insights and analytics, the bots may classify game stream titles as stable, underperforming, or unstable.

  • Stable. Able to successfully complete tests within 5 seconds of the average of that title over the past two weeks.
  • Underperforming. Timings for one or more tests exceeded the average for that timing over the past few weeks by 5 seconds or more.
  • Unstable. Unable to successfully complete one or more tests, thereby resulting in missing action timings.


AI is finding and classifying real bugs on production game streaming servers and data centers. These include failed stream launches, blank screens, lost stream connections, latency issues, and more. As the bots run these types of tests over a period of time, they get better at recognizing issues and identifying relationships among issues.  Without AI, it would be virtually impossible for humans to connect these different pieces of information due to the large volume of data that is generated from each test run.



Another interesting aspect of this technology is that, not only are the bots able to find these real-world defects and prevent them from reaching production, but they can automatically file bug reports during or after testing completes. By using the feedback feature built into the streaming app, bots submit issues directly to the engineering teams in real-time.

Last, but definitely not least, in the world of game stream testing is figuring out whether the video quality of the stream is acceptable to the end user. Testing video quality is by no means a trivial problem. Most approaches to testing video quality today are manual, where many humans watch the video stream and give it a rating on a scale of 1 to 5. The scores are then averaged to produce a metric called the mean opinion score (MOS). 

As you can imagine, the manual MOS approach is slow, expensive, and clearly doesn’t scale. Furthermore, traditional automation approaches to video quality focus on scanning the video stream pixel by pixel, looking for failure modes such as blank screens, vertical and horizontal lines, and blurry or pixelated segments. 



The biggest drawback of traditional video quality test automation is that the code tends to be rigid and hard to maintain, and even when it detects issues there is no easy way to map them back to the end user experience. However, using an AI-first approach to video quality, bots are able to mimic the human MOS predictions for movie and television streams. Hopefully, sometime in the not so distant future, this technology will also be applied to end-to-end testing of the game streaming user experience.

AI for Gameplay testing

AI for Game testing


When it comes to playing games, AI has come a long way. Decades ago, the bots started out using brute force computation to play trivial games like tic-tac-toe. But today, the bots are combining self-play with machine learning, rather than brute force, to reach expert levels in more complex, intuitive games like Go, Atari Games, Mario Brothers, and more. 

In 2019, Open AI Five, a team of five bots that use reinforcement learning at scale to cooperatively play Dota 2, won back-to-back games against world champions. This was a major accomplishment, one which begs the question: If the bots can do that with AI, why not extend them with AI-driven testing capabilities? When placed in this context, the test automation problem for gameplay doesn’t really seem as hard. It’s really just a matter of bringing the previously mentioned AI for game testing technologies together into an environment that combines it with real-time, AI-based gameplay.

Let’s take a look at a real-world example. Suppose you’re tasked with testing a first-person shooter, where players engage in weapon-based combat. The game has a cooperative mode in which you can have either friendly players or enemy players in your field of view at any given time. During gameplay, your player has an on-screen, heads-up-display (HUD) that visually indicates health points, the kill count, and whether there is an enemy currently being targeted in the crosshair of your weapon. Here’s how you can automatically test the gameplay mechanics of this title:

  • Model the Basic Actions the Bots Can Perform. In order for the bots to learn to play the game, you must first define the different moves or steps they can take. In this example, the bots can perform actions such as moving forward and backwards, strafing left and right, jumping or crouching, aiming, and shooting their weapon. 
  • Define Bot Rewards for Reinforcement Learning. Now that the bots can perform actions in the environment, you want to set them to take random actions and then give them a positive or negative reward based on the outcome. In this example, you could specify three rewards: 
    • A positive reward for locking onto targets to encourage the bot to aim its weapon  at enemy players.
    • A positive reward for increasing the kill count so that the bot doesn’t just aim at enemies, but fires its weapon at them.
    • A negative reward for a decrease in health points to discourage the bot from taking damage.
  • Implement Real-Time Object Detection and Visual Diffing. Using images and videos from the game, you then train machine learning models that enable the bots to recognize enemies, friendlies, weapons, and any other objects of interest. In addition, you train them to report on the visual differences observed in-game when compared to the previously recorded baselines.
Bot is autonomously executing the gameplay


While the bot is autonomously executing the gameplay actions using goal-based reinforcement learning, it is also checking for cosmetic bugs using object detection and visual diffing. As you can imagine, you can take this even further by analyzing audio similarity to validate sound effects and music.  If the bot is able to accomplish the tasks without discovering any bugs, then you know that those features are working as intended — all without human intervention. Furthermore, you can give the bot goals that seek to intentionally explore the boundaries of the gameplay environment, or reward it for actions such as jumping on walls and ceilings, in hopes of breaking the game physics. A key advantage of reinforcement learning based testing is that as the application evolves or gameplay changes, the bots can autonomously re-learn how to accomplish the same tasks in the new environment.

Case Studies

Now that you have an understanding of AI for game testing technologies, it’s important to discuss their implementation in practice. Here, I present two anonymous case studies, and share real-world experiences from deploying these technologies to production.


Game Store Testing Case Study

This first case study involves a gaming industry leader that operates some of the world’s largest interactive games and has over 50-plus million customer accounts. This industry leader licenses its 3D graphics technology and visual simulation software to other industries. This 3D technology supports its competitive advantage in gaming, film, and other interactive entertainment venues. The company employs hundreds of software developers worldwide and has offices and  facilities across the globe. It subcontracts manual  testing overseas and has several hundred testers  that regularly provide testing services.

The developers and testers at this company experience constant “ping pong” errors. One engineer makes a small tweak and tests the gaming modules, and then another one finds a new bug. So, perhaps a month later, this developer fixes that bug, and now the first developer notices, as if by magic, a new or similar bug to the one they just fixed a month ago. The testing strategy being used does not adequately address real regression testing of the game end-to-end, and the company views such a complete, end-to-end full regression for games as virtually impossible. 

Initially, the company does not quite believe that AI-based tools can provide the claimed automation.  However, the testing and quality assurance team of its flagship game feels that legacy tools are such an inhibitor to their speed and efficiency that they decide to invest. 

So where is their starting point? Well, since their games are almost entirely graphical, with hundreds to thousands of purchasable items, they began with transforming the testing of their game store—an important revenue source. Changes to items in their game stores happen daily and due to the large scale, these changes are often not adequately tested.  The result often leads to customer requests for support and refunds.  When there is a product defect, it can create a high volume of customer support calls and frustration on the part of the gamers.

Integrating AI for game testing into this company’s workflow has been transformative. They now have a fully self-service solution where they can define test cases for hundreds of their assets, and train new assets on the fly. The gap between the capabilities of their legacy test automation tools and modern DevOps practices for their game store has been eliminated, and they can now support the high demand for new content quickly and reliably. They went from a 2 week long regression testing period, down to being able to run the same tests within 12 hours. Better yet, if they wish to run those same tests in 6 hours, all they have to do is double the computational resources and so they are able to scale on-demand with technology rather than people.  Needless to say this gaming industry leader is now a believer in the capabilities of AI for game testing and has experienced the leap in productivity these technologies can provide first-hand.


Game Stream Testing Case Study

Our second and final case study is the gaming division of one of the largest enterprise  software companies in the world. Its products include gaming hardware such as consoles and peripherals, and gaming software, including the hosting of cloud-based digital content that stream games to its customers.  These streamed games are developed by both the company’s own software development teams and other third-party vendor studios.

The game streaming service has hundreds of games that need comprehensive quality assurance and testing. A substantial percentage of its applications are updated frequently and require continuous validation to maintain user experience quality. All of this adds complexity to the quality assurance and test process. Performance is essential to the streaming service, and the development team is tasked with finding answers to these tough questions: What is the measured response to an individual player for a click on any particular element? Is this adequate to maintain the satisfaction of the user experience? How can this validation and quality determination be done continuously? 

As a brand new release of their game streaming service was approaching, this company knew they needed to find the answers to their questions, and find them fast, and so they also made the choice to invest in AI for gaming technologies. Their main focus was performing top of the rack switch testing for their game streaming data centers.  Using AI, not only were they able to identify performance issues when interacting with specific game titles under load, but they were also able to reproduce a network switch issue that was manifesting itself in production environments. The game streaming platform team was successfully able to launch the new release, and reduce 20 person-days of test scripting down to 10 hours of AI-driven test execution.  Reflecting on the entire experience, the company reported that they believed that they would not have been able to release in time without this technology.


Hopefully you enjoyed this chapter on AI for game testing. It is impressive to see bots testing games like humans, ranging from testing the client application, server and data center performance, look and feel, sound effects and music, and overall gameplay experience. With all the ongoing AI research in gaming, you would think that AI for game testing would be mature and widespread.  However, this technology is still very much in its infancy. On the upside, the future of AI for game testing is bright. If you think about deploying testing bots across several games, or even using many bots within a single game, it becomes apparent that AI is the solution for covering this extremely large test space quickly, efficiently, and reliably. 


Bots testing games like humans


So what’s next?’s mission is to test the world’s apps. The gaming domain is certainly a significant portion of the world’s apps, and hence a critical part of that mission. In the future, I believe that these technologies will be embedded in game development kits, tools, appstores, and platforms. In other words, AI for game testing will be inherent to the game design, development, and release lifecycle.  I’ve already challenged my team to think along these lines, and as a result they have started building frameworks and plugins for game developers. Creating such tools will place visual diffing, audio similarity, and reinforcement learning based testing features directly into the hands of the people who need them the most—possibly revolutionizing the entire game development industry.

About the author

Wolfgang Platz

Tariq King

Tariq King is the Chief Scientist at, where he leads research and development of their core platform for AI-driven testing.

Tariq has over fifteen years' experience in software engineering and testing and has formerly held positions as Head of Quality, Director of Quality Engineering, Manager of Software Engineering and Test Architect. Tariq holds Ph.D. and M.S. degrees in Computer Science from Florida International University, and a B.S. in Computer Science from Florida Tech.

His areas of research are software testing, artificial intelligence, autonomic and cloud computing, model-driven engineering, and computer science education. He has published over 40 research articles in peer-reviewed IEEE and ACM journals, conferences, and workshops, and has been an international keynote speaker at leading software conferences in industry and academia.

Over the years, Tariq has served the professional software testing community as a program chair, audit chair, reviewer, committee member, and advisory board member.

About is a leader in building AI-First powered software test automation tools that help testers, developers, and business stakeholders accelerate the release of high-quality applications. replaces legacy test automation tools that don’t work well, fail often, and are hard to use. Our AI-First powered bots build the tests, scale them from one platform to many, and maintain them as your applications change. No scripting or coding is required. Our customers include some of the world’s largest technology companies, enterprise APP developers, and some of the world’s largest APP platforms manufacturers.

Visit us at