How to Ensure a Successful Data Migration: Assessment and Automation Strategies

Rupesh KatkamwarApril 5, 2016

With a number of emerging big data platforms now out on the market, many companies are looking to migrate their data from their current relational database management platform to a big data storage structure. However, large data migration projects are inherently complex.

While there are many reasons data migration projects run into trouble, our own experience has taught us that defining an effective and efficient testing strategy is critical to a project’s overall success. In this two-part post, we examine what a solid planning and testing strategy should include to ensure success and to avoid the most frequent failures that occur during data migration.

In this post, we focus on preparing for data migration through proper planning:

Conducting a Platform Difference Assessment
The core aspect of a data migration project is the accurate transfer of a source database to a target database. To ensure that migrated data is accurately represented in the target system, it is necessary to first assess where there may be discrepancies in the values/records of each column/table in the source and target databases. This analysis should include identifying any differences in the types of data types used, time and date formats, and differences between displaying control characters. It should also assess if there are issues with NULL, BLANK, and SPACE usage between the source and target database.

Identified differences should be documented along with the appropriate procedures for addressing these inconsistencies and approved by all key stakeholders before proceeding. Additionally, conducting an assessment is important for identifying the exact scope of the migration. The assessment should focus on both historical data migration and incremental data migration using some ETL jobs.

The assessment phase should also include an evaluation of what testing tool is most appropriate, including assessing a tool’s compatibility with both the source and target database and how the tool will be integrated. In a large database migration, you need a tool that is robust enough and has a large enough database capacity to handle the database comparisons. Many tools may work fine for small tables, but when comparing extremely large tables in one go, less robust tools can take a tremendous amount of time to analyze the data or can get hung up and stop working.

Use Automation Utilities
One of the biggest challenges in any data migration process, and particularly in a large data migration project, is validating the source data against the target data. Because a data migration will involve thousands of tables and hundreds of gigabytes (GB) of data, manual validation is highly inefficient and in many cases not realistic.

While sample testing and manual verification is possible, it still leaves large amounts of the new database untested. Therefore, automating the testing process is the best solution. To automate the process, test scripts are written that compare and validate the data present in both the legacy and new database. While there are several tools on the market than can help with the automation process, QuerySurge is one tool we have found to be highly successful in automating the validation process.

QuerySurge is a collaborative data testing solution built specifically to automate the testing of big data and data warehouses while providing a holistic view of the data’s health. QuerySurge not only ensures that the data extracted from sources remains intact in the target system by analyzing and pinpointing up to 100 percent of all data differences quickly, but also provides real-time and historical view of the data’s health over time.

Whatever method you determine is best however, automation is critical to for running validation tests on large swaths of data. Ultimately, automation ensures less errors throughout the validation testing process than manual small batch data testing.

To learn more about the validation types that are necessary and how to perform these tests through automated processes, stay tuned for our next post on data migration.