Data classification is the process of categorizing data based on its level of sensitivity and the impact to the organisation in case of unauthorized disclosure or exposure. Choosing the right data classification tools is critical for organisations that handle sensitive customer, employee, financial, intellectual property or other regulated data. Here is a step-by-step guide on how to select the ideal data classification solutions for your organisation:
Determine Your Data Classification Requirements
The first step is to understand your specific data classification needs. Some key aspects to consider are:
-
Types of sensitive data – Does your organisation handle personal customer data like names, addresses, social security numbers etc? Do you store employee payroll information? Do you have intellectual property like source code or chemical formulas that require protection? Knowing the types of confidential data will help narrow down solution options.
-
Classification categories – How many levels of classification do you need? Typical options are public, internal, confidential and restricted. More categories allow finer grained control.
-
Classification scope – Will you classify broad repositories like file shares and databases? Or classify specific documents and fields like SSN in customer records? Broad classification requires tools that can scan entire repositories. Field-level classification requires capabilities like metadata tagging.
-
Automation needs – Do you need to manually classify data or require tools that can fingerprint and automatically categorize data? Manual classification provides precision but requires extensive user effort. Automated tools enable scalability but can have less accuracy.
-
Integration needs – What other systems like data loss prevention (DLP), rights management or encryption solutions do you need to integrate with for unified data security? API and SDK availability determine integration flexibility.
Document your specific requirements so you can match them against vendor offerings.
Evaluate Leading Data Classification Tools
Many vendors offer data classification solutions. I recommend evaluating both dedicated data classification tools and capabilities in broader data security platforms. Here is a comparison of leading options:
| Provider | Tool | Strengths |
|-|-|-|
| Microsoft | Azure Information Protection | Tight integration with Microsoft ecosystem, automatic and manual classification options |
| Google | Cloud Data Loss Prevention | Powerful scanning and fingerprinting capabilities, integration with BigQuery and Cloud Storage |
| Amazon | Macie | Machine learning based automatic classification, integration with wider AWS services |
| Boldon James | Classifier | Flexible tagging at file, folder, database and column levels |
| Titus | Data Classification Suite | Highly scalable big data discovery, classification and reporting |
| Forcepoint | Data Loss Prevention | Integrated DLP and rights management capabilities |
Look for providers that align with your specific use cases, data types, volumes and integration needs.
Evaluate Capabilities Like Discovery, Tagging, Protection
Key capabilities to evaluate in data classification tools include:
Data Discovery – How effectively can the tool identify sensitive data across structured and unstructured data stores? Assess aspects like machine learning based auto classification, connectors to scan databases and file shares, optical character recognition to extract text from images etc.
Classification and Tagging – Can data be classified into defined sensitivity levels through metadata tags? Can this be done manually, automatically or both? How granular can tagging be – file, folder, database or even column level?
Protection – What data protection capabilities like encryption, rights management, redaction, DLP integration etc. are available for classified data? This allows creating policies to actually secure sensitive data.
Audit and Reporting – What visibility do you get into classification coverage and sensitive data locations? Comprehensive auditing and reporting is key for governance.
Remediation – Can the tool quarantine, encrypt or delete misplaced sensitive data based on policies? Automated remediation reduces risk.
Scalability – How well does the tool scale across servers, cloud environments, big data stores etc? Performance and scaling determine classification scope.
Choose a tool that excels in capabilities required for your use cases.
Consider Cloud, On-Premises and Hybrid Options
Data classification tools are available as:
-
Cloud services – Native classification in the cloud like Azure Information Protection or AWS Macie. Offers lower implementation effort.
-
On-premises software – Install and run classification servers on your infrastructure. Gives more control and customization.
-
Hybrid solutions – Allow defining policies centrally and scanning both cloud and on-premises environments. Provides flexibility to cover all data.
Factor in where your sensitive data resides – on-premises, cloud or both – and how much control you need versus ease of deployment.
Evaluate Operational Considerations
Beyond core classification capabilities, evaluate additional solution aspects like:
-
Pricing model – Per user, per TB scanned, or tiered pricing based on volume? Models based on data scanned or number of users provide predictable costs.
-
Customer support – Responsive and knowledgeable support smooths deployments and ongoing management.
-
Training resources – Well documented user guides, admin manuals, online training etc. simplify getting users and admins up to speed.
-
Implementation effort – How much configuration, customization and integration work is required for deploying the solution? Quick and painless deployment allows faster time to value.
Choosing a solution that aligns with your operational practices helps drive adoption.
Start With a Pilot Project
Once you shortlist suitable data classification tools, it is best to kick off an initial pilot rather than enterprise-wide deployment. Key steps for the pilot include:
-
Start with a limited data set and user group – for example one file server, SharePoint site or database.
-
Test core functionality like data discovery, scanning, tagging and protection.
-
Validate integration with other systems like DLP, encryption and SIEM solutions.
-
Measure overhead on networks, servers and end user productivity.
-
Solicit feedback from pilot users and admins to refine processes.
The pilot project allows you to validate the solution, iron out issues and build best practices before larger roll out.
Choosing the right data classification solution takes research, planning and piloting. But the improved data security and compliance posture are well worth the effort for protecting your organisation’s critical information assets. With some upfront diligence, you can implement data classification capabilities that precisely meet your unique requirements.