Kaizen Challenge: Unstructured Data Processing

Maxx DuniphinApril 27, 2021

Stay Modern. Not just our mission but a lifestyle at InfoCepts. In 2019, to fulfill this vision, we developed internal contests, appropriately named, Kaizen Challenges, from the Japanese term meaning “change for the better” or “continuous improvement”, aimed at skill improvements and innovative thinking for our associates.

Quarterly, a new challenge, with a custom theme, is announced to the associates. From there teams are formed, rules are announced, and the countdown for submissions begins! This past challenge, I had the chance to join a team for Kaizen Challenge 5 (KC5), and I’d like to share what I found so appealing about the whole process.

Kaizen Challenge: Unstructured Data Processing

KC5 was all about Unstructured Data Processing and was sponsored by the InfoCepts Data Management Center of Excellence. During the kickoff meeting, the COE members set our goal: define a business problem statement for creating value from unstructured data, and design an end-to-end solution. High scores would go to teams processing complex unstructured data while relying on industry-standard data integration tools. These scoring guidelines give the contest a sense of purpose, by steering effort towards unmet needs in the organization, as seen by the Data Management Center of Excellence COE.

Our team of five agreed on a problem statement: Organizations find that too much time is dedicated to manually confirming matches between text and images of claim documents. The expense claim confirmation process is identified as the best process to automate, due to historically infrequent mismatches and the semi-standard layout of receipts.

We tested the Azure Form Recognizer natural language processing (NLP) service from Microsoft, and we were impressed by the results of the text parsed from images and the REST interface. We set out as a team to connect the dots in our plan. We described our source system, and how we’d create value from the unstructured data.

Ultimately, we chose Talend Open Studio 7.3 as our Data Integration tool after reviewing the default and user-created components. Using Talend OS our team was able to:

  1. Extract claim data and document URI from source SQL database.
  2. Process the images through the Azure Form Recognizer cloud endpoint
  3. Compare text and parsed image results, loading the pass or fail indicators to SQL database.

Lessons Learned from Kaizen

Because Stay Modern is our vision, our approach aims towards continuous improvement methods and ideas. Using these Kaizen challenges, InfoCepts’ associates are challenged to identify gaps, learn lessons, test new technologies, and ultimately grow individually and as a team.

Participating in this challenge I was able to learn 4 key things:

  1. Empower People
    My team was organized with people I usually don’t get to communicate with on a daily, even weekly basis. Joining this team I was able to not only meet new people in the organization but also learn new technologies and processes from associates on other assignments.
  1. Know your customer
    When we identified our customer’s challenges, we were able to enhance the overall experience and create customer value.
  1. Transparency is Key
    We would not have been successful as a team if we were not transparent from the beginning about the success criteria, tasks at hand, and if we could complete them effectively.
  1. Have fun
    I fully believe this skill-building initiative was worthwhile. It was fun to learn new tech and build an ad hoc team, in a low stake and purposeful game.

The spirit of kaizen is ignited at InfoCepts and we are ready for more!