Data Security

How to Reduce Your Cyber Risk With Data Masking

February 23, 2024

What is Data Masking?

Data masking is the process of disguising sensitive or confidential data in order to reduce privacy and security risks. It helps organizations share sample data for development, testing, and troubleshooting purposes without exposing real data.

Data masking replaces the original data with fictional but realistic looking data. The data format, type, and basic statistics remain unchanged so applications continue to operate as expected while the underlying data is protected.

Common data masking techniques include:

Substitution – Replacing data with fictional information. For example, substituting 123 Main St. with 789 Elm St.
Shuffle – Mixing up data by switching values. For example, swapping ages or names between records.
Generation – Creating fictional data that appears real. For example, generating fake names, addresses, and credit card numbers.
Encryption – Transforming data into unreadable cipher text. Requires decryption to reveal the original data.
Tokenization – Substituting sensitive data with non-sensitive replacements or tokens that have no extrinsic value.

Why is Data Masking Important?

There are several key reasons why organizations should implement data masking:

Privacy – Masking protects sensitive customer data like names, addresses, social security numbers, etc. This reduces risk of accidental data leaks.
Compliance – Many regulations like HIPAA and GDPR require protecting personal data. Masking helps meet compliance mandates.
Security – Masking reduces the impact of data breaches by ensuring only fictional test data is exposed.
Productivity – Developers gain access to more sample data for testing without security bottlenecks.
Cost Savings – Organizations avoid costs associated with securing live production data.

How Does Data Masking Work?

There are several techniques used to mask data:

De-identification

This removes identifying information like names, IDs, and addresses so the data cannot be traced back to individuals. For example, replacing names with fake names like “John Doe”.

Encryption

Encrypting data transforms it into unreadable cipher text. However, it requires keys to decrypt the data which may not be suitable for some test systems.

Tokenization

Tokenization substitutes sensitive data values with non-sensitive tokens or surrogates. For example, credit card numbers could be replaced with tokens like XXXX-XXXX-XXXX-5692. The token has no meaning outside the masking system.

Data Shuffling

Data shuffling mixes up data by switching values between records. For example, shuffling ages between customer records. This preserves the general distribution of values.

Data Generation

Sophisticated tools can generate fictional but realistic looking data for names, addresses, social security numbers, etc. This fictional data can closely mimic real data patterns.

Implementing Data Masking

Here are key steps for implementing data masking:

Identify Sensitive Data

Document systems, databases, and files that contain private or regulated data. Prioritize this data for masking based on level of sensitivity and risk.

Select Masking Techniques

Choose masking approaches that are appropriate for each data type. For example, substitution for names and addresses, shuffling for ages, etc.

Configure Masking Tools

Set up masking tools and rules to automate masking workflows. Vendors like Informatica, Delphix, and CA Technologies provide data masking platforms.

Mask Non-production Environments

Apply data masking to clone or refresh non-production environments like development, QA, analytics, and training systems.

Limit Access

Restrict access to original data and masking tools to authorized individuals only. This prevents inadvertent exposure of real data.

Test Systems

Verify masked data works correctly by testing systems and queries that consume the data. Fine tune masking rules if needed.

Ongoing Masking

Schedule periodic masking to refresh non-production environments with new fictional data. Monitor for new sensitive data requiring masking.

Benefits of Data Masking

Privacy – Masking reduces risk of unauthorized access and disclosure of personal data.
Security – Masking minimizes impact of potential data breaches involving non-production systems.
Productivity – Frees up access to sample test data without lengthy security reviews.
Cost Savings – Avoids expenses of securing live data copies for testing and development.
Agility – Enables faster refresh of test environments with new masked data on demand.
Compliance – Helps meet data privacy regulations and internal policies.

Conclusion

Data masking substitutes fictional data for sensitive information to lower privacy and security risks.
Key techniques include substitution, shuffling, encryption, and tokenization.
Masking is applied to non-production environments like development and QA systems.
Organizations can accelerate development and testing while protecting confidential data with robust data masking practices.