Featured image for Data Masking vs. Encryption

// Other

Data Masking vs. Encryption

Subscribe to our newsletter
Featured image for Data Masking vs. Encryption

To develop an application or website or digitalize business processes, you must select protection means for essential data. There are two solutions: data masking and data encryption. This article will tell you in detail about these two approaches, the difference between them, and which one is more likely to fit your requirements. 

What is data masking?

Data masking or obfuscation is a privacy protection that provides complete or partial substitution of actual data by face ones that look true. Furthermore, unreadable symbols can substitute essential information. 

In health or popular science literature, you might have noticed that the author replaces actual names and surnames of patients with random names or initials. 

Another example of masking is hiding the information that helps identify a person. Namely, an organization database includes the employees' spreadsheet. The spreadsheet contains social insurance numbers, names, and surnames; data masking allows replicating the source database yet hides the social insurance number.

An example of data masking: replacing social insurance numbers with x

An example of data masking: replacing social insurance numbers with ‘x.’ Source

Data masking is typically employed for well-structured information since this approach helps store the source data's characteristics and integrity. Masked data has maximum protection from data leaks and thefts, yet if necessary, one can use it for analytics, learning, testing, etc. 

There are countless data types that masking can protect; still, the most common are:

  • PII: personal information;
  • PHI: protected health information;
  • PCI-DSS: payment card industry data security standard
  • ITAR: intellectual property. 

Data masking essentials

Data masking must follow several rules for the efficient protection of critical information:
  • Data masking must be irreversible. Once the data is masked, it must not be allowed to be extracted back from masked data. If masked data can be unmasked, it is a critical security issue, especially if they are in free access.
  • Data must be representative. Regardless of the approach employed for data, it must remain relevant while the characteristics and structure — authentic. For instance, if data is in a spreadsheet, then the replicated file must be a spreadsheet too. If it stores the patient information, then masked data must be in this category. It is not permitted to change the names of people, city names, etc. Not only that, but geolocation and gender must remain. This information must be readable and numerically structured.
  • Only private data is masked. Data masking is not necessarily applied to each field. For example, in the client record, it is not obligatory to conceal the gender and place of residence since, if masked, it will make the process more complex and expensive. At the same time, the category will be less beneficial for analytics and learning.
  • Data masking must be automatic. Since the data frequently changes, the data masking must be automated. Else, data masking will be complex, time-consuming, expensive, and inefficient.
  • The integrity must not be compromised. Data masking must not compromise the integrity of databases.  For example, if a credit card number is the primary key of a table and if it is encrypted for masking, each instance of that card number must be encrypted in the same way.

Data masking methods

Numerous methods of data making can help in various scenarios depending on the data character and information volume. Let’s have a look at the most recognized ones.

Data encryption. It is probably the most complex and secure method of hiding data. To hide data, you will employ any encryption method. A special code will help to decrypt the data. This method is perfect for company production data since they require frequent decryption. The prime drawback is that if encryption key is compromised, any unauthorized party can decrypt data and access the company's source data if the encryption key is compromised. 

An example of data masking via encryption

An example of data masking via encryption. Source

Data scrambling. This data masking involves randomly mixing the symbols and numbers, thus hiding the source content from unauthorized access. This strategy fits only specific data types, yet the private data will not be as protected as you expect. For instance, if the company makes an order with an identification number 889351, it will become 938581 after scrambling, yet after knowing the price, it is easy to understand what the client buys. Scrambling is only suitable for scenarios where private data are not linked with other data represented in your database or other open sources. 

An example of scrambling for data masking

An example of scrambling for data masking. Source

Substituting data. This approach masks source data by substituting their actual value, for example, changing ‘Den’ to ‘Pitter.’ It is one of the best masking strategies since malefactors will not always understand that they deal with fake data. Still, the disadvantages are that it is challenging to implement within an automated algorithm, and there might be issues with representing data. For instance, Den is three symbols, while Pitter is six symbols; hence, if the spreadsheet has space only for three characters, the other three will overlap or exceed the spreadsheet column.

An example of substituting data

An example of substituting data. Source

Data generalization. This widespread data masking method implies reducing data details and preserving privacy. The main target of the generalization method is to replace specific values with general yet semantically correct solutions; for instance, replacing ‘age 25’ with ‘age from 20 to 30.’

An example of a generalization method of data masking

An example of a generalization method of data masking.Source

Data randomization. The idea of data randomization is likewise substitution, even though it employs data from the same column instead of random synthetic values. For example, it will change data on the employees’ age. The data will seem accurate, but it will not reveal personal information. On the other side, the randomized data is vulnerable to reverse projecting if one notices the data is mixed.

An example of data randomization for masking

An example of data randomization for masking. Source

Resetting data. According to this approach, a column in the spreadsheet will have a Null value. Resetting will mask data so no unauthorized user can view it. It’s a straightforward strategy with cons: loss in data integrity, complex usage of masked data, and if the null column is linked to a function or process.

An example of resetting data

An example of data randomization for masking. Source

Data aging. With this method, the age of data will grow or reduce. For instance, changing the date from ‘January 1, 2021’ to ‘April 7, 2018’ or ‘February 2, 2022’. It is not a widespread method, yet beneficial for confusing the competitors.

Pseudonymization. This term is used by the EU General Data Protection Regulation (GDPR). According to this regulation, pseudonymization is any process that prevents data from being used to identify people. This generally means removing all direct identifiers and avoiding multiple identifiers that, in combination, can be used to identify a person.

What is data encryption?

Data encryption transfers readable information (text, code) into unreadable (encrypted text, code). Encrypted data require description via access key or password to be readable again. Usually, unique and automatic algorithms work on encrypting data on the internet. For instance, the connection between your browser and the website you use to access this text is automatically encrypted since the SSL certificate is employed. 

A straightforward example of encrypted information.

A straightforward example of encrypted information

Unlike masking data, encryption has no requirements to structure. One can encrypt any set of numbers, letters, or symbols in any order. In short, it is possible to encrypt any data in any volume. Moreover, data can be encrypted while idle in storage or used while sharing. The device employed for encryption must have enough power for computing within reasonable terms.

Two types of encrypted data

Data encryption often scrambles data and information into a sequence of random and unrecognizable characters. The encrypted information is then passed to the recipient, who holds the decryption key to turn the encrypted text into plain text.

Symmetric encryption. If encryption and decryption require the same key, then it is the symmetric type of encryption. Usually, it is employed for protecting idle data since one can securely pass the secret key to the receiving party. 

Symmetric encryption employs the same secret key

Symmetric encryption employs the same secret key. Source

Asymmetric encryption. In this case, two interdependent keys are involved, public and private. Asymmetric data encryption uses a public key that does not threaten sensitive data. The private key does the description, and if hackers compromise the private key, it will cause data leaks.

For the best experience with asymmetric encryption, you will need a key management system that employs critical public infrastructure to provide security and protection for public keys. 

Asymmetric encryption employs two different keys

Asymmetric encryption employs two different keys. Source>

Masking vs. encryption 

Data masking protects data by deleting any part of private data or substituting it with an analog structure with other values. On the other hand, encryption employs complex algorithms to transfer confidential information into an unstructured set of symbols (code), so the source information becomes unreadable without the secret key. 

Both tools have the same purpose, namely, protecting data. Since both means of protection use different approaches for the same target, it is vital to understand the difference, advantages, disadvantages, and which is better for a particular case. 

Data employment. Masking protects the information on each process regardless of whether it is idle or used. Furthermore, it is possible to mask all data at a time, a separate unit, category, or data about a specific person. Furthermore, with masking, there are no identifiable links to accurate data; hence, the information is secure for public sharing and protected from hackers. 

Therefore, masking is an excellent instrument for protecting confidential information in public sources; for example, names and surnames of patients in health literature, credit card numbers, and social insurance in banks, tax, or police reports. Also, masking is highly-recommended for following PCI DSS, CCPA, GDPR, HIPAA, etc. law requirements on protecting personal information.

Data encryption is perfect for protecting unstructured data (text, images, audio, video, etc.) for idle and employed scenarios. Hence, the traffic in the network is encrypted but not masked since it is easier, faster, and cheaper.

The requirements for data. As mentioned above, masking requires well-structured data. In other words, you must be able to highlight the necessary information within the structure and substitute it with synthetic data or unreadable symbols. Frequently, it is the main difficulty in employing the masking method. 

For example, if you have to record and share a private video, it will be awkward to mask confidential data (cat plates, faces, logos), even if it is published on YouTube. Therefore, it is much easier to encrypt the video, share it, and mask the required objects. 

Irreversibility. Masking data is irreversible, while encrypted information can be decrypted with the secret key. It’s essential if it requires sharing information about an investigation, yet the details on the witnesses or detective must be hidden.  

Selectivity. The other essential difference between masking and encryption is that masking can be employed on particular elements. Namely, masking protects specific information like names, surnames, locations, and account balances. 

Selectable masking of confidential data in declassified documents

Selectable masking of confidential data in declassified documents. Source 1, Source 2

Resource intensity. Masking data requires structuring; hence, it takes additional resources. Encryption doesn’t have such requirements; thus, the process is faster and easier. Still, if the source data is structured, then there is no difference in the resource spent for masking or encryption.

Security. Each approach offers its pros and cons. The irreversibility of masking makes it perfect for protecting confidential information in public reports and databases. For instance, Google Maps masks faces and car plates. Furthermore, masking allows distinguishing the access levels for users. 

For example, if you have a database of store consumers:

  • Call-center employees can see the contact data of your clients (phone numbers, emails, names, and surnames).
  • Financial office employees see the full name, contact data, and purchase statistics.
  • Security specialists can see any information, including bank cards and accounts. 
Encrypting data allows for protecting data from reverse projecting, demasking, and decryption since encrypted data is unreadable. Those who do not have the secret key will see a random set of numbers and characters conveying no sense. 

Availability. Both data masking and encryption are now available to every Internet user; the web is full of befitting applications (paid and free). However, if you need to mask information, it can be difficult if your data is structured in a way unfamiliar to the application (the masking algorithm). 

Whereas data encryption usually requires no effort. For example, you can simply archive your data with RAR or ZIP protocols and put a password on the archive - that's how most people encrypt their sensitive data. 

Final thoughts

Masking data and encryption are excellent instruments for data protection. Understanding their differences will help you select a solution for securing your critical information. The grounded decision you make will provide you with solutions that meet your requirements and the needs of your business.

Merehead does professional development of Data masking vs ecryption. If you have questions, contact us for a free consultation.

How can we help you?

Full name *
Email *
Phone
Your budget
Tell us about your project
Merehead review. Vleppo is a startup, digital asset exchange platform based on the Komodo blockchain protocol.