Data is everywhere, and wherever data is used, there ought to be data governance. Data governance is the set of guidelines, both spoken and unspoken, that surround the collection, usage, storage and security of data. It is how data is managed in order to ensure its usefulness.
Data governance, as an initiative, is not a small undertaking. Dealing with data permeates virtually every level of an organization. So how do you tackle it? Just like running a marathon, you don’t just get up off the couch and do it one day. A marathon runner has first likely followed a set of best practices in training – both exercise and diet – and takes time to understand what is required.
Best practices in data governance fall into three categories: strategic, technical and security. Understand and utilize all three to create a usable data governance framework and to understand what it entails.
Data governance is as much a strategy as it is a rule book. Strategic best practices center around how data governance affects your business from the inside out and the outside in. The choice to engage is a strategic one because the data is used to inform business decision-making.
Your organization may already have a strong grasp on data governance at one level, but perhaps not at every level. An interesting “maturity model,” created by Gartner shows that any initiative begins with awareness. In some organizations, awareness begins in the IT department, as they do battle with bad data while trying to create a new application. In others, it might be the realization of the CFO that they aren’t in compliance with regulations.
Initiatives can begin either from the top down (the CFO wanting to comply with regulations) or the bottom up (a data entry clerk who notices a bunch of duplicates). Either way, the first step towards a true data governance practice is to start talking. Making people aware that the integrity of data is a concern will start the gears turning.
Don’t bite off too much at once: In spite of the data being everywhere and the need for initiatives to be all-encompassing, starting big is likely to end in utter failure. For starters, there are too many data sets to even consider that the exact same set of rules can apply to all of them. While there may be some basic rules (e.g. don’t enter incorrect data, follow the style guide for data entry), collecting inventory information is very different from collecting customer information. Private employee records are going to have different rules than the facility's maintenance schedules.
The best way to start a data governance initiative is to pick one system, process or need as the guinea pig project. There will be enough big-thinking around that one project to keep you busy and test the waters. Keep it manageable. Once a conversation starts, people will start thinking about it in a bigger way and applying concepts to other systems, which can be approached when the time is right.
This is the most strategic part. Why bother? Why do you care about data integrity? You have to have a business reason for caring whether or not the data is accurate, secure, etc. The vision of where you are going will lead the direction of the conversations and the decisions that occur. What do you want the data to tell you, and why? Imagine you want to determine which are the most popular products over the last six months and the demographics of those who are buying them. Some possible reasons:
To answer these questions, you need different data. For some you need contact information, for others, you need competitor data. The answers you are seeking will guide the data sets needed. This is only a small part of the larger strategy. Understanding the direction your business is going and what role the data will play serves as a guidepost for all decisions. Understanding the strategy also helps to gain buy-in from those in the organization who wonder why there is all this fuss about data.
Thinking about data governance, data integrity, and being data-driven is something that has to occur at every level of an organization. You can’t just have your CEO stand at a podium and announce one day, “From now on, we’re going to take care of the data.” Like a hastily made New Year’s resolution to lose weight, it wouldn’t necessarily hurt anything, but it isn’t going to magically make change happen. This is culture change. It is not easy. It is not fast. It might be painful. The point is that your organization needs to treat it for what it is, a culture change initiative, and not just as some new set of rules and guidelines. While rules will be a part of the process, communication to shift expectations and attitudes will be just as important.
As with any initiative, knowing your business metrics and creating key performance indicators (or KPIs) that are in line with the business strategy, the desired cultural shift, and the tasks that need to happen helps you know when milestones have been reached.
When people start thinking about data governance, their first thought might be the IT department. After all, IT deals with a lot of data and seems – to some people – to have magical capabilities to make things happen.
While IT certainly plays an integral part inf the process, data governance is much bigger than IT. When it comes to the technical aspect of data governance, everyone has a role. That role might be in working with IT, but it also might be in the creation of guidelines and the follow-through with making them work.
Your master data is the data that is most valuable, and that helps you find other data. In each database, certain fields will likely contain identification numbers, or some unique identifier to allow you to quickly locate a record. Master data may also be data surrounding an important part of your business (e.g. customer lists, vendor lists). For each data governance project, you may find yourself dealing with different sets of master data. It some cases, the data may have already been recognized as master data. In others, especially for new projects, determining master data might be a greater part of the task.
If you don’t already have certain standards for collecting data, this should be a top priority in your process. First, it is important to find out if your industry is regulated or affected by data collection laws. For instance, GDPR regulations passed in Europe require that you get consent before collecting personal data from people.
Aside from regulations, it is also about consistency. In databases, certain fields may be required in order for a record to be created or completed. Standardized electronic or paper forms could be developed to facilitate this process. In order to ensure that data is collected at all appropriate times, a set of criteria is needed for when and where data should be collected. For instance, a car dealership might determine that any time a potential customer reaches out through a phone call or an email, the information should be documented, and the customer’s information should be recorded. This requires those that talk to a customer to ask for that information and to enter it into a CRM.
Organizing is about figuring out where your data is and putting it into a logical order. Integration is about making sure the data is working across departments and applications. The two work hand in hand to create useful data storehouses for analytical purposes.
Organizing your data is making sure it is arranged in a usable structure. An example is the structure for statistical analysis required by several data analysis tools. Consistently employing and labeling your digital or paper filing system is another structure that allows people to find the records they need quickly. Databases are also a form of structure, but databases can be built poorly. A solid structure will avoid duplicating data and instead create different views that pull applicable information.
Integration works together with the organization. While organizing your data, you might find that the same data exists in multiple locations (e.g. a file cabinet and several different applications used by several different departments) but with different data attached to each instance. Integrating creates a centralized way for this data – probably your master data – to be updated across all systems at once.
For example, a workplace uses a visitor registration app that requires all visitors to sign in and include contact information. This app is integrated with the Contact Relationship Management (CRM) system that is the central source for contact information and records of connection. The app allows for automatic updates of contact information in the CRM. This makes the records more useful and accurate across both systems.
Part of any data governance project is the act of cleansing the data. This is the process of cleaning up any duplicate records, making sure terminology is the same (e.g. USA vs. U.S.), adding missing information wherever possible, getting rid of out-of-date records and more. Data cleansing follows any standards already in place and any that are created during this process. Once all the standards are clear, all the data will be given a good cleaning. However, cleansing data is really no different than doing the laundry. Doing it just once isn’t enough. It must be done regularly, or the result is a bunch of dirty data that you can’t use. Scheduling and conducting ongoing cleanup on a regular basis will keep the data tidy. Clean data is the most useful kind of data for analysis.
Of course, one of the biggest concerns with data these days centers not around the data itself, but in who can see it and use it. Is it safe from those who would use it incorrectly? Do those who need access have access? Is it being stored in the best way? Security should be a part of all data governance conversations.
While IT is an integral part of data governance in all its parts, technical security is where IT is an absolute necessity. They hold the keys to the kingdom through multiple different processes. One best practice is determining which forms of security are necessary for the type of data in questions. Consider:
Physical security is about where the data is. This could be digital or physical. While we’d like to think that all our data has gone digital, the fact is that paper still exists. In some organizations, paper may still even be preferred; for instance, paper has a longer life span than most digital formats when it comes to archives, and there is some debate whether faxes are more secure than email. Generally, paper is far less secure than electronic data in most situations, e.g. a paper visitor log that anyone can flip through vs. a digital visitor logbook.
Even in a mostly paperless workplace, chances are there are still several file cabinets floating around. Best practices for storage include:
The most important best practice is communicating with staff about expectations and guidelines around data governance and protection. While IT puts the structure of technical security in place and enforces it with passwords, etc., they cannot control what employees do with the data. All employees need to have an understanding of what is expected of them, what they can and cannot do with data, and what consequences are. In order to create this set of expectations, organizations have to answer a lot of questions:
Data governance is an ongoing effort for all organizations. It is not ever complete and it is not ever perfect. Thus, one of the most important things to recognize is that the best practice is to keep practicing. Keep evaluating what is working and what isn’t. Continue communicating with employees about what is needed. Make sure IT personnel stay up to date on the latest security issues and safety measures. Tackle every new data set as it comes. If your organization continues to keep data governance in its consciousness, it will become a part of the company culture moving forward.