Thinking Outside the Box: A Dynamic Framework Solution

Honey LahotiMarch 29, 2016

Simplify Source Data Analysis with a Dynamic Framework Solution for Better Performance with Less Expertise Required
Data management is challenging on many levels—from cleaning and formatting the data to handling large volumes of data in a limited time. Data quality issues often inhibit successful data analysis, making it difficult to maximize the value of the source data to derive solutions or actionable insights.

Recently, a retail client of ours was struggling with this very issue—how to generate high quality data for analysis. They were having trouble with issues such as bucketization of rules—a common part of the data quality analysis process. Upon in-depth analysis, we found the following causes:

  • Lack of proper ETL implementation
  • Data from different sources with prevalent data issues
  • Data quality tool performance issues

While a common approach to these types of data issues is to use a data quality tool, the tools available for this client’s data environment had performance issues due to the voluminous amount of data they needed to analyze. In addition, our client was reluctant to use a data quality tool due to their own lack of expertise. Finally, the data quality tools available on the market did not provide the flexibility the client required to accommodate different business rules.

Thinking Outside the Box: A Dynamic Framework Solution
In order to overcome the above limitations while optimizing the data quality analysis process, InfoCepts’ team carefully studied the client’s data environment, including the entire structure of the data. To resolve the issue, we developed a generalized, dynamic, process-oriented framework.

This solution included the ability to bucketize rules so they can be generalized and grouped together based on different quality attributes such as:

  • Referential Integrity
  • Nullability
  • ST Mapping
  • Duplicate records
  • Uniqueness
  • Aggregation
  • Reconciliation
  • Pattern analysis
  • Range

How It Works
To develop a dynamic framework solution, we started by creating a metadata in the database for the above bucketized set of rules. This structure allows all the dynamic information pertaining to queries—such as table name, column, or other information for validation—to be captured and fed into the metadata table for each rule. Additionally, whenever a new rule is introduced, the metadata becomes a repository and new rules are added as per the defined standards.

Based on these data quality attributes and the different parameters required to process the query on a daily basis, we then created a framework where the parameters could be approved by the client through a simple Excel file. Next, a set of stored procedures were created for each data quality attribute and a generic query was written. The query fetches the parameters from the metadata to create the query and user input file for parameters. This process helps modify the WHERE clause of the queries for dynamic execution.

The overall framework provides a completely dynamic solution. Allowing the client to validate over 155 rules for approximately 20 different combinations—meaning that around 3,100 queries (155×20) can be executed within an hour or less. Also, the results of any count mismatch are added to a table on daily basis and emailed to the concerned audience automatically through script—making it simple to find and fix errors.

Dynamic Framework Solition_Example

Benefits of a Dynamic Framework
A dynamic framework offers many benefits. Not only is its creation a one-time effort, but implementation is faster than using a data quality tool. Additionally, a dynamic framework provides considerable flexibility in terms of modularity of the framework.

Other benefits of a dynamic framework solution include:

  • Requires less skilled resources to run the solution, saving costs
  • Eliminates dependency on a data quality tool
  • Simple to understand and maintain by ETL teams
  • Resolves performance issues associated with data quality tools
  • Customizable and accommodating of new rules
  • Maintains rule repository for any future reference

Due to all of the above benefits and our own experience, we believe that a dynamic framework can not only achieve the results necessary, but that it is a better solution than investing in data quality tools to analyze incoming data.

If you’d like more information on how a dynamic framework might help you achieve better results with your source data analysis, please send an email to sales@infocepts.com.