Domain 2: Cloud Data Security Module 18 of 70

Module 18: Hashing, Tokenization, and Data Obfuscation

CCSP Domain 2 — Cloud Data Security Section B 6 min read
The exam tests whether you can choose the right data protection technique for the right scenario. Encryption, hashing, tokenization, and masking each solve different problems. Applying the wrong technique is as dangerous as applying no technique.

Hashing

A hash function produces a fixed-length output (digest) from variable-length input. Hashing is one-way — you cannot reverse a hash to recover the original data. The exam tests hashing for integrity verification and password storage, not for data confidentiality.

Exam-Relevant Hashing Concepts

  • SHA-256/SHA-3: Current standard hash algorithms. The exam considers MD5 and SHA-1 deprecated for security purposes.
  • Salting: Adding random data to input before hashing to prevent rainbow table attacks. Essential for password hashing.
  • HMAC: Hash-based message authentication code. Combines a hash with a secret key to provide both integrity and authentication. The exam may test HMAC for API request signing.
Exam trap: Hashing is NOT encryption. You cannot "decrypt" a hash. If a question asks how to protect data confidentiality, hashing is not the answer. Hashing protects integrity — verifying that data has not been modified.

Tokenization

Tokenization replaces sensitive data with non-sensitive tokens. Unlike encryption, there is no mathematical relationship between the token and the original data. A tokenization system maintains a secure vault that maps tokens to original values.

Cloud Tokenization Use Cases

The exam frequently tests tokenization for payment card data (PCI DSS scope reduction). By tokenizing credit card numbers, systems that handle tokens are removed from PCI scope because tokens are not cardholder data. The token vault that maintains the mapping remains in scope, but the rest of the environment is simplified.

Tokenization vs. encryption: Encryption preserves the data format only with format-preserving encryption (FPE). Tokenization can preserve format without any mathematical relationship. In cloud environments, tokenization is often preferred for reducing compliance scope.

Data Masking

Data masking replaces sensitive data with realistic but fictional data. Unlike tokenization, there is no vault — the original data cannot be recovered from the masked version. The exam tests two types:

  • Static masking: Creates a permanently masked copy of the data. Used for non-production environments (development, testing, training).
  • Dynamic masking: Masks data in real-time based on user privileges. A privileged user sees the real data; an unprivileged user sees masked values. Used for production environments with role-based visibility.

Anonymization and Pseudonymization

Anonymization

Irreversibly removing all identifying information so that the data subject cannot be identified. True anonymization means the data is no longer personal data under GDPR. The exam tests whether you understand that anonymization is permanent — if re-identification is possible, it is not true anonymization.

Pseudonymization

Replacing identifiers with pseudonyms while maintaining the ability to re-identify subjects using additional information stored separately. Pseudonymized data is still personal data under GDPR because re-identification is possible. The exam tests this distinction — pseudonymization reduces risk but does not eliminate GDPR obligations.

Choosing the Right Technique

NeedTechnique
Verify data has not been modifiedHashing
Protect data confidentiality (reversible)Encryption
Reduce PCI DSS scopeTokenization
Safe test data from productionStatic masking
Role-based data visibilityDynamic masking
Remove GDPR obligationsAnonymization
Reduce risk while keeping re-identification abilityPseudonymization

Key Takeaways

Each technique serves a different purpose. Hashing protects integrity. Encryption protects confidentiality. Tokenization reduces compliance scope. Masking enables safe data use. Anonymization removes regulatory obligations. Pseudonymization reduces risk while preserving utility. The exam tests whether you can match the technique to the requirement.

Next Module Module 19: Data Loss Prevention (DLP)