Tokenization, Encryption and Compensation Controls
Encryption
PII is often encrypted with a symmetrical key. Encryption requires a key an algorithm like AES, and some data points (cleartext). Using the Key + AES you can encrypt cleartext resulting in ciphertext. Using AES + Key and passing the ciphertext results in a cleartext decrypted. Sounds simple and straightforward right? No. Encryption is Hard. Symmetrical encryption has many challenges:
- How many keys should we have?
- What systems should have access to what keys?
- How often the keys will be rotated?
- Where will the encryption happen?
Encryption often has performance implications. Basically, we can encrypt on the datastore side (if such a feature is available) or we can do it on the application side, also known as application-side encryption.
Datastore side encryption: Has some benefits, for instance, it is transparent for the application. What happens when multiple applications are accessing the shared database, well, Will all see the same data. Considering a Distributed monolith is datastore side encryption can be very problematic. Our main benefit here is that we can use the datastore in the full set of features. For instance, for a relational database, we can do joins and it is good that we can do engineering as usual.
Application side encryption: While it will be more secure, we can have more keys and different applications can use different keys, even with a distributed monolith we can reduce the blast radius, but only to some degree. For instance, if we have 5 applications that need to access the same table, there is nothing we can do, we need to share the same 5 across all 5 applications. The benefit comes as you dont need to share the same tables or different applications use different tables. Now the big drawback is that you can't use a bunch of features on the datastore, and joins might be completely lost. Full-text search is over, considering Redis, a bunch of commands will not work. Another drawback would be, what if engineers forget to encrypt data before sending it to the DB?
Compensating Controls
Compensating control can be used instead of encryption or even as a solution to reduce the number of migrations. However, the best thing to do from a security perspective would be to use encryption alongside compensation controls.
Some examples of compensating controls are: segregating responsibilities, segregating environments, having strong logging and monitoring to be able to detect attacks and anomalies very fast, applying the last privilege principle consistently: only giving access to the exact roles needed, avoiding resource *, wide open permissions, sharing super users like admin or root, etc... A simple form of compensating control besides AWS IAM, is security groups, they are simple and effective.
Compensating controls might be the only measure in old and legacy systems where you might not have the source code and therefore might not have a better way to protect the application from within. One good thing about compensating controls is that often they can be applied on the infrastructure side, instead of the application side. Depending on your team topology, compensating controls might be on DevOps / Infosec teams instead of the service teams. PCI environments are a classical example of compensating controls.
Tokenization
Tokenization is a very interesting alternative to encryption and compensating control.Tokenization requires encryption and compensating controls as well. However, encryption tends to be centralized on the tokenization server or vault.
Tokenization is growing nowadays, and there are open-source solutions like Open Privacy Vault and startups like Skyflow. It is not hard to build a tokenization solution. Not so easy to certify a PCI environment. However building the solution is not so complex, you have a token, which could be a simple UUID and that token is linked to a PII data point which should be encrypted. Using the token we can decrypt the data.
Tokenization shines when we do not need to detokenize very often. IF your use cases require constant detokenization, classical encryption would be a better fit. Tokenization will require compensating controls as well. Tokenization also makes a lot of sense for Big Data. Tokenization requires compensating controls otherwise it would not be secure. Tokenization also needs to be approached with caution, for instance, if you have a shared tokens database where all services have read/write there you can create a Distributed Monolith.
Wrapping up
Cheers,
Diego Pacheco