State of AI applied to Quality Engineering 2021-22

Section 5: Manage data

by Sogeti

Business ●●●●●
Technical ○○○○○

Listen to the audio version

Download the "Section 5: Manage data" as a PDF

Use the site navigation to visit other sections and download further PDF content

By submitting this form, I understand that my data will be processed by Sogeti as described in the Privacy Policy.*


Effective test data management, which includes data mining, data generation, and data maintenance, is critical for quality engineering. Test data serves a variety of purposes. Not only should it be available on-demand in our environments, but it should also be associated with each possible test and reflect the interdependence of numerous systems, as well as be combined for specific test cases.

Given the current regulatory requirements for test data management, it is critical to have a standardized mechanism for self-service, virtualization, synthetic data generation, and masking production data, as well as demand management, governance, and metrics for measuring and monitoring the health of testing activities.

So how do numerous organizations manage their test data?

Generally, it is a lengthy process. Following the creation of a copy of production data with multiple subsets, sensitive information is temporarily masked, and the data is copied to various testing environments.

This strategy presents two distinct challenges.

To begin, test data impairs the data's quality. Compliance concerns arise as a result of the fact that production data is not masked. Production data are subject to some variation. When data reaches quality assurance, it is often insufficient, invalid, or out of date.

Second, test data impairs the speed and agility of testing. Quality assurance teams rely on an upstream team. Testers are forced to sift through unwieldy copies of production data to locate the required combinations. When a test is modified, the scripts and queries associated with it become unusable. As a result of data loss, cannibalization, or consumption, there is a dearth of data reuse.

This section discusses how artificial intelligence can help mitigate the inherent risks associated with test data management by leveraging the concept of synthetic test data to ensure adequate coverage of test data, to comply with privacy regulations prohibiting the use of production data in test environments, and to shorten the time required to create representative test data for end-to-end testing. These machine learning-powered platforms can generate any type of data while retaining its characteristics and relationships. Data is anonymized in accordance with applicable laws.