How to Protect Structured and Unstructured Data

Every effective PII protection effort addresses three critical imperatives – data discovery, access governance and risk mitigation. IT teams grappling with privacy mandates need to consider these factors across their unstructured and structured data contexts. And while regulations such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) outline expectations for handling personally identifiable information (PII), they aren’t much help when it comes to the tactics you need to succeed. Let’s take a look at some effective strategies – and how they differ – across structured and unstructured data.

Data Discovery

A typical organization manages unstructured data in more than 10 million files containing everything from marketing and sales information to client contracts, to employee insurance and human resources information. Discovering PII in these files remains one of the toughest data security challenges of our time, and it’s easy to understand why. It is, on the other hand, a bit harder to understand why structured data discovery can also be difficult.

Structured databases should provide an easy map to PII – but database designs often predate modern privacy regulations and, as a result, few production databases were designed with privacy in mind. Sensitive information is often scattered across different databases, in different tables and in different fields. Sometimes, PII is duplicated across tables or in unrelated databases. Finding it all can be tougher than you think, but it’s a critical first step. PII protection starts with PII discovery.

Fortunately, emerging automated PII discovery tools can help find PII in both structured and unstructured data. In the unstructured data world, rules and end-user classification programs have long been used in an attempt to identify PII – but they haven’t been effective or manageable. Finding PII across an organization’s databases, on the other hand, is a question of determining which databases and tables contain regulated data, identifying duplications and accessing risks. Recent artificial intelligence (AI) innovations show promise in automating discovery for both structured and unstructured data.

Data Access Governance

A clear and complete understanding of who can access PII and how they can do it, is the key to understanding risk and implementing mitigation strategies. But these notions of “who and how” differ quite a bit for structured and unstructured data. For example, large-scale databases supporting web applications, such as those handling e-commerce operations, typically connect those applications to databases via a handful of service accounts. Tracing who has access isn’t usually a problem. Increasingly, API connections to databases extend access, sometimes outside the organization itself. It goes without saying that, even though it may be simple to determine who has access, each connection needs careful oversight.

Cataloging access for unstructured data is far more complicated. Empowered end users make highly consequential access control decisions, and those decisions are dispersed and ungoverned. Inappropriate sharing with external or personal emails, link sharing (especially unprotected or non-expiring links), files stored outside of designated locations and unclassified files that slip by data loss prevention (DLP) services are just a few ways data can be lost. Understanding and managing access in this context is an enormous governance challenge.

As with the data discovery process, recent innovations in AI can clarify who has access and whether PII access is appropriate. Replacing legacy approaches that rely on file locations, pattern-matching rules or end user document markup, AI can assess risk based on document content and the security practices in use for similar content.

Risk Mitigation

Security professionals, now armed with a clear understanding of what data they have and where the risks are, can develop more effective PII protection strategies. The tactics for protecting structured and unstructured data are, again, quite different. Here are some key tips for structured data risk mitigation:

  • Refactor your database to eliminate duplication, clarify data structure, and make PII discovery easier for whoever has to do the job once you’re gone.
  • Tokenize and/or encrypt sensitive fields to add an extra layer of security on top of your access control best practices.
  • Delete what you don’t need. A major PII spill of unneeded years-old data is, to be blunt, an unforced error.
  • Explore emerging technologies for API security and granular database access control. Most service accounts currently have very broad access and, consequently, poor API design or implementation can be a weak link. See what you can do to tighten things up.

There are emerging tactics to also consider for unstructured data:

  • Strive for least-privilege access control at the file level for all business-critical data.
  • Leverage AI-based automation to discover data and assess risk.
  • Folder-level security isn’t good enough – in our research, we’ve found sensitive files in all-hands folders in nearly every organization.
  • Continuously monitor the situation. Users create thousands of new files each year, and a one-time audit is not going to cut it.
  • Look for ways to enlist your entire security stack in the PII risk management effort. With AI, for example, you can now autonomously assess risk and automatically tag files as sensitive. Those tags help data loss prevention solutions do a faster, more accurate job.

Compliance is a complex topic; each situation is different for a particular data and regulatory environment. Having a clear understanding of how to discover, assess and protect structured and unstructured data, and their differences, provides a foundation for an effective and manageable program to protect critical PII and regulated data.

Avatar photo

Karthik Krishnan

Karthik Krishnan is CEO and founder of Concentric.ai.

karthik-krishnan has 2 posts and counting.See all posts by karthik-krishnan