This blog discusses the advent of Master Data Management (MDM) across emerging data disciplines, such as Big Data and Data Lakes, and how it fits into today’s Enterprise Data Architecture. We also look at the data challenges presented by merging Structured and Unstructured Data.



Big Data and Data Lakes are topics that are predicated upon massive data volumes that span up to terabytes and exabytes of information. Only through the sourcing and storing of this data have disciplines such as advanced analytics allowed us to garner a new wealth of insights.

On the other hand, Master Data represent a much smaller footprint than the data-sets typically utilized in Big Data computing. For the purpose of this piece, we define Master Data as “a normalized set of identifiers used to describe specific parties relevant to an enterprise, such as Customers, Suppliers, Product, Location, Services, etc.  Despite the smaller master data volumes combined with the relative importance of Master Data, many companies lack a clear strategy for effectively managing these data assets and end up hemorrhaging insights that are critical to their business objectives. So, how do we ensure that we are capitalizing on the opportunities that MDM presents?

This blog discusses the advent of Master Data Management (MDM) across emerging data frameworks, such as Big Data and Data Lakes, and how it fits into today’s Enterprise Data Architecture.

One of the best practices of MDM is to architect an independent solution that is considered the true source of enterprise master data. This single source of truth will be connected to enterprise-wide transactional applications like the company’s ERP and CRM.


MDM for Structured Data

Traditionally, MDM for Structured Data has been used to describe information about B2B customers, via traditional OTC (Order to Cash) and P2P (Procure to Pay) business processes, as depicted in Figure 1, below.


       order-to-cash-cycle                 procure-to-pay-cycle











Figure 1: Traditional workflows for Order to Cash and Procure to Pay cycles


This process appears simple enough, however, there are issues if you don’t clearly define a Master Data strategy. Using the example above, let’s walk through a few data challenges we have seen around the use of Structured Data:

MDM-Structured Data


Most of the legacy MDM tools have the capabilities to address Master Data problems. In this blog, we don’t intend to go into the native capabilities of any specific MDM tool. Many forward-thinking enterprises have already addressed, or are addressing, these B2B Master Data challenges, by leveraging a suite of MDM tools.  

Below are the typical capabilities provided by legacy MDM tools to resolve the master data challenges in the Structured Data domain.

  • Data quality and validation rules
  • 3rd Party Address validation
  • Governance Workflow Engine
  • Consolidation
  • Match and Merge
  • Harmonization
  • CRUD processes
  • Metrics

While these challenges can be addressed on an individual level, we stress the importance of a comprehensive MDM strategy, which will allow enterprises to maximize the benefits of each data driver.


The Secret Sauce: MDM for Unstructured Data

Innovation in the Master Data space, however, involves developing a strategy that incorporates unstructured data. Now, let’s walk through how MDM interacts with Unstructured Data, and the potential data issues and solutions.

The Data Lake serves as a repository for raw data describing more abstract (and often unstructured) data points such as customers’ Feelings, Sentiments, Likes, Dislikes, and Comments about your products or services. Combining this Unstructured Data with your Structured Data is the backbone of the Big Data discipline, and also presents some of the biggest challenges for deriving business value. The use of MDM tools and best-practices can help address these.

Simply put, the Data Lake is the “Madness”. MDM brings “Method to the Madness”.

There is untapped value in the Madness, since unstructured data represents a tremendous upside for business users in the modern enterprise, due to its relative lack of exploration.

So, what are these critical sources that produce this Madness? In the world of Big Data, a primary source of unstructured data comes from a company’s interactions with their B2C customers

  • Various channels/sources for B2C customer data:
    • Web
    • Phone and Apps
    • Social media
    • Blogs and product review sites
    • Public domains
  • For example:
    • A traditional retailer like Samsung selling a product to Joe Smith (or Joseph Smith or Joseph W. Smith or J. Smith or JW Smith)
    • These names may be same person, or not – however, big data technology itself cannot determine this


Typically, this data is not yet normalized, and these transactions are defined as different consumers at this point. The stakes are high in converting these transactions into viable customer records, because B2C customers are as important as B2B customers. Individual B2C customers cannot be ignored, otherwise you will lose the opportunity to develop customer loyalty, and the opportunity for both cross-selling and up-selling. As brand construction and product development are becoming increasingly driven by customer feedback, tracked via Sentiment Analysis, then ineffective use of customer logs also result in the loss of valuable information.

To walk through some of the issues that Master Data solves, let’s revisit the profile of Joe Smith, from above. As Joe continues to compile transaction records, the Data Lake gets filled with unstructured data, as described below and shown in Figure 2:

  • There is a Joe Smith who is making a purchase related to a phone of Brand X.
  • @JoeSmith is Tweeting some positive comments on Twitter.
  • J. Smith is putting pictures about his recent phone purchase on Facebook.
  • Joseph W. Smith is commenting on an article related to his Phone brand on LinkedIn.
  • Two of Joseph W. Smith‘s friends “Like” the pictures on Facebook and “Like” the comments on LinkedIn about the phone.
  • JW Smith makes a phone call to Brand X’s Call Center and enquires about Phone accessories like phone cases, screen protectors, and an extra backup phone charger. A colleague of JW Smith’s, Matt Simon, creates an account on Brand X’s website and saves some phone-related items in the shopping cart.
  • Joseph Smith puts a Google review on his purchase of the phone screen protector

The story goes on and on, and events get recorded in a multitude of different places. As you can see, there are challenges in combining these data records in a scalable manner.


Figure 2: Data Lake filing up with unstructured data through series of random events


Let’s consider the many profiles that are generated in the series of events described above.

  • Twitter - @JoeSmith
  • Google - Joseph Smith
  • Facebook - J. Smith
  • LinkedIn - Joseph W. Smith
  • Samsung account - JW Smith

Without an effective MDM strategy, an enterprise database will regard these transactions as several scattered, random, discreet events, unrelated to each other.  Conversely, by leveraging the art of combining structured and unstructured data, Data Scientists can connect these interdependent dots and create data linkages from which meaningful insights can be extracted.

We’ve established that a strong Big Data and MDM capability allows companies to collect, process, and present insights from these types of activities. Let’s discuss the first steps to constructing that capability. Let’s assume that there is a Data Lake in place which collects all this information, processes it and presents the findings alongside the analytic insights.

Regardless of how sophisticated Data Lake is, Big Data technologies themselves would not be able to intelligently discern whether @JoeSmith on Twitter is the same person as Google’s - Joseph Smith, Facebook’s - J. Smith, LinkedIn’s - Joseph W. Smith, and Samsung’s - JW Smith.


Do you know if your MDM tool and data have the capabilities to integrate and analyze a combination of both Structured and Unstructured Data? If your enterprise can derive insights out of this collection of events, then you have achieved a significant competitive advantage in the Data arms race. If not, you run the risk of losing ground, and customers, quickly.

So, what is possible when an enterprise leverages MDM? Figure 3 depicts how multiple data sources are pulled into a single source and sorted. First, all of Mr. Smith’s digital activities related to your company would point to the same customer log. Let’s imagine that Mr. Smith dials into the Call Center. What happens next?

  • The phone number is identified as the primary customer key for the Call Center.
  • When the customer calls back:
    • The system identifies the caller as Joseph Smith, the recent purchaser of a Phone case
    • While talking to Joseph Smith, the Call Center representative accesses the analytics dashboard that describes Mr. Smith’s preferences, series of purchases, demographics his likes and dislikes, positive and negative reviews, and so on.
    • The Call Center representative is able to deliver a tailored brand of customer service to Mr. Smith, thus building customer loyalty, receiving valuable customer feedback, and increasing the lifetime value of the customer


Figure 3: Serve your customer better with Big Data empowered by MDM


To drill down further into the art of normalizing and mastering the data identifiers, Figure 4 depicts how Big Data can be transformed into meaningful insights by leveraging MDM.


Figure 4: An example of MDM Value-Add for Big Data



As stated in the article, the most sophisticated Data Lakes can still fall victim to poor data strategy, costing the enterprise a wealth of insights and customer loyalty. Only by leveraging a Master Data Management Strategy that enables the intersection of Structured and Unstructured Data, can businesses unlock a new wealth of value that is being held across countless uncommon data sources.

It is worth noting that the most effective solutions are developed using Agile principles and can be managed through sprint cycles. Taking this approach will reduce the time taken to show Proof of Value and decrease the Total Time to Market.

In a subsequent blog, I will detail more use-cases and methodologies describing best-practices in the MDM domain. From a technical standpoint, we will discuss what goes into these Big Data transformation projects and how MDM would impact them.

Should the material in this article spark your interest about the business value that can be unlocked via MDM, please reach out to me at I’d be happy to share a few of these insights with you.

BY Pravin Bhute, Manager - MDM Lead at WorldLink US