Why Test Data Management is More Important Than You Think

IBM Big Data and Analytics Hub website cited a case study, where a US insurance company was estimating 15% of their testing efforts to be just test data collection for the backend system and the frontend system.

To quote the study, “For every USD 14 million delivery by the software development and QA team, a hidden USD 3 million was being spent on data management. All data management tasks included moving data from back-end systems to identifying test data, data masking of sensitive data, skipped production defects due to unavailability of correct test data, manipulation of data for different scenarios, storage of test data.”

The test data management for the company had become a big problem and had to be solved. So the complete process was reviewed and evaluated. Finally, a process for test data management was implemented. This helped the insurance company to save USD 400,000 annually, in the cost of testing.

Above example clearly states the importance and need for proper Test Data Management(TDM), also known as Software Test Data Management.

What is test data?

By Wikipedia– “Test data is data which has been specifically identified for use in tests, typically of a computer program.”

The test data required by testing team to test an application can be of two types:

1. Static data- This is the data which does not change after being recorded and usually comprises non-sensitive data like City name, PIN code etc.

2. Dynamic data(Transactional data)- This data can change after being recorded and usually comprises sensitive data like the medical history of the client, number of employees etc.

For testing purposes, usually, a mix of static and dynamic data is needed. Data can be present in different formats, different databases and different types. Testing may require data from different sources according to a specific requirement of the Application Under Test (AUT).

Mostly the data which is used for testing is production data because it covers all types of different data which an application may encounter in a live environment.

Now, imagine a scenario where the transactional data containing credit card number, mobile number, bank login credentials are provided to the testing team for testing purposes.

In case of improper use of such critical and high-risk data, legal action by the customers is definite. This breach will not only result in financial loss, but the trust of the customers will also be lost and which eventually will cause catastrophic damage to the business of the bank.

So how to test a business-critical banking application in such a case, without production data, where improper data will result in daunting production defects?
The answer is data-masking.

We will use the production data, after masking or hiding the sensitive information. This masking comes under TDM(Test Data Management), where we intend to keep the sensitive production data separate from the test data.

Let us understand a bit more about test data management (TDM).

What is test data management

On Informatica, we find the definition of TDM as – “the creation of non-production data sets that reliably mimic an organization’s actual data so that system and application developers can perform rigorous and valid system tests.”

In simple terms, Test data management (TDM), is a process which involves management- planning, design, storage and retrieval of test data. TDM ensures that test data is of high quality, appropriate quantity, proper format and fulfills the requirement of testing data in a timely manner.

To create test data there are three approaches:

1. Copy production data

i. The actual production databases are copied or cloned in this approach.

ii. Due to the large size of the production database, it is a time-consuming process.

iii. Creates dependency on the production environment, the testing and development team cannot create the test data themselves.

iv. It is a high-risk process because the sensitive data of customers’ is at stake. If data breach happens then legal procedures may hinder the business badly.

2. Synthetic test data generation

i. A database administrator(DBA) creates and runs SQL queries on the database tables to gather the required test data.

ii. Expertise of the DBA is crucial, extensive knowledge of the schema, relationships, and database is required.

iii. It is time-consuming because query writing and running them on DB may take time.

iv. DBA needs to add all the negative and boundary value conditions as well in test data for testing.

3. Data subset creation

i. Unlike the data cloning approach, different subsets of the production database are copied and not the whole database.

ii. This approach is time-efficient because a subset is copied, so not the whole database is involved.

iii. Skilled people are required to decide what data should be copied.

iv. Data masking is an important step in data subset creation. The sensitive data is masked, to rule out any data mishandling.

v. Data subset creation is the most used data creation approach in the test data management process. The other two approaches are usually avoided due to the cost involved and data sensitivity.

Check out the what, how and why of data driven testing

Steps for test data management

1. Analysis of Data requirement

This test data could be needed on different interfaces of the application. The format and type of data may also be different on these interfaces.

So, the first step is to understand the data requirement of the organization based on the test cases that will be run. This will require knowledge of the domain, business and all the applications involved in the whole end-to-end process.

Example- a banking system, it will have a CRM system, a financial application for transactions, which will be coupled with messaging systems for SMS and OTP. Here, the person analyzing the test data requirement should have expertise in banking domain, CRM and financial application knowledge and messaging system also.

2. Data subset creation

As we have seen above, this is the most widely used data creation technique. The real production data is copied to provide different subsets which accommodate all the test data requirements.

The accuracy, uniqueness, consistency, referential integrity all these features of the test data should be taken care of while copying the data. Data for boundary value and negative testing is also created by modifying the subsets or adding some data.

3. Data Masking

We are dealing with sensitive production data, it is really important to hide the customer data like medical history, bank login information, phone number, credit/debit card information etc. Any failure to protect sensitive data may lead to compliance and regulatory issues.

4. Automation and tools

In TDM, automation can be used to perform the above tasks of data cloning, data generation and data masking. If done manually all these steps are really time-consuming and error-prone as we are dealing with huge data.

Automation scripts could be created or licensed test data management tools like Informatica, Delphix DATPROF etc. can be used. Advanced tools also help in reporting, to aid the organization make better decisions about test data.

5. Maintenance and Refresh

There is a central repository of the test data, which has rules for access and privileges. The test data needs a periodic refresh to reflect the latest and most-relevant test data. If multiple modules in a project are using the same test data repository a properly managed refresh cycle is a necessity.

Along with data refresh, the maintenance of the repository is also very important. Over a period of time, the test data may become obsolete or redundant. There has to be proper maintenance of the test data to keep it consistent, correct and available over time.

Otherwise, such data will hold unnecessary storage space in the repository and the search for relevant test data may take longer than expected.

Why test data management is so important

Having a dedicated test data management team and a systematic TDM process in place has immense benefits for the organization and the customer.

Below are the points which depict the importance of TDM.

1. Increased test data coverage: TDM helps in having traceability of the test data to test cases and then to requirements. This provides a bird-eye view of the test data coverage and the defect patterns.

2. Cost reduction by finding the bugs early: As seen in the previous point, there is better test data coverage and the traceability provides a clearer picture. This helps in finding the bugs early, and the cost of production fixes is reduced.

3. Data is provisioned based on testing type: A unique feature which is provided by a TDM process is that the data is managed in one place. From the same repository, appropriate data can be provisioned for different testing types- Functional, Integration, Performance etc. This reduces redundant data copies, and hence the cost of storage is reduced.

4. Data compliance and security: There are strict regulations and compliance rules by govt. and authorities and these need to be followed by everyone. Data masking is an integral part of a TDM process, data security and compliance are given top priority.

5. Reusability of data: Reusability is the most valuable feature of the TDM, as this helps in further reduction of cost. The reusable data is sorted out and is archived in a central repository for future use. Whenever the requirement for reusable data arises, the testers can use the archived data.

6. To reduce copies of the data: In a project, multiple teams can make multiple copies of the same production data for their use. This results in redundant copies of the same data and storage space are misused. When a TDM is used the same repository is used by all the teams and hence the storage space is utilized diligently.

7. Customer’s trust: The key advantages of the TDM process are quality data and very good data coverage. With these qualities present during the testing phase, the bugs are unravelled early. The result is a stable and high-quality application, which has minimum production defects. Customer’s trust level in organization increases, when a customer gets to see such enticing results of adopting a TDM process.

Conclusion

Test data creation is performed by the testing team, usually, the testing team does not have direct access to the production data. Even if the production data is provided, it is a large chunk of raw data. This raw data cannot be used directly for testing purposes, a considerable effort is needed to sort, manage and tailor the data for use.

High-quality data is the basic need if we are planning to have high-quality software testing. Average data quality will provide mediocre results after testing, and no one ever wants that. To resolve all these problems test data management is the best solution.

With Agile and DevOps the testing cycles are getting smaller. To create quality data within that cycle along with performing software testing can get really complex. To reduce cost, time and efforts in the testing cycle -Test data management seems to be an ideal solution, with visible results. This instils a sense of satisfaction and trust in the customer, and better business is the outcome.

Testsigma is one such test automation tool that is made to enable continuous testing along with elaborate test data management. The test cases can be automated in simple English via NLP and can be maintained easily using the self-help feature that is built in it.

Testsigma recognizes the need for an efficient test data management system and has built-in a test data generation facility. It also supports multiple data sources like JSON, excel and in-built data-tables such that you can choose the format that suits you the best.

Manage your data and simplify your test automation with Testsigma

Magazine

Why Test Data Management is More Important Than You Think

What is test data?

What is test data management

Steps for test data management

Why test data management is so important

Conclusion

About the author

Author's Latest Articles

THE %(site_name)s COMMUNITY