GDPR in a world of Micro Services

June 17th, 2019 Software Architecture 'gdpr keyboard' by Pete Linforth on pixabay

Let's first start with stating: I'm not a lawyer and I'm not providing legal advice. The below article is just some thoughts regarding my understanding of the GDPR and micro services architecture.

In May 2018 the General Data Protection Regulation became enforceable. But how do we harmonize this regulation with the world of micro service, which seems to be the standard architecture to start with nowadays.

A lot of us have heard about some stuff in the GDPR, like you are not allowed to process the data for any other means than the subject is made aware of. But let's have a look at the principles of GDRP, I translated them from the Dutch Wikipedia page on AVG, because the English Wikipedia page lacks the easy to read list of principles.

Transparency
Purpose limitations
Data minimization
Accuracy
Storage limitations
Integrity and confidentiality

In this article I'm covering a couple of above principles in the nature of a micro services architecture.

The GDPR also comes with some privacy rights for the subject, of which I'm only covering the right to data portability in this article.

Right of access by the data subject
Right to rectification
Right to be forgotten
Right to restriction of processing
Right to data portability
Right to object

The purpose limitations principle

Limiting the purpose of the data means you're not allowed to use it for other things that the subject hasn't been made aware of. So if I gather the e-mail address of customers who ordered something, I can't just start using their e-mail address for a marketing campaign. We all know this, that's why all order flows ask you if you want to be informed about products or something similar.

Our world has grown a bit and we suddenly have a large amount of events going through our landscape. My subscription micro service is listening to CustomerCreated and CustomerUpdated events to gather some small amount of data of customers that have a subscription. On of the fields we store is the e-mail address, because we want to inform our customer when their subscription is ending. This sounds like a reasonable case and I don't feel we are exceeding any rules, they bought the subscription sounds legit to inform them about the ending subscription.

Half a year later a smart college comes with the idea to add a little bit of information in subscription ending e-mail. Actually this would be some information about other products that are subscription based. Does this sound legit? I'm not so sure anymore. Even though the subscription service has the customer data via eventual consistency, it doesn't own the data. So it might not even know what kind of usage the subject allowed.

If you're a small company with a small set of maybe 100 micro services we can investigate what purposes are allowed, and which aren't, but will we? And even more important, most of the time we didn't investigate if our usage was allowed, we even didn't do this on purpose.

Having some policies that make clear how to handle personal data usage within the company would prevent accidents here. Also as a good reminder, actually having the data doesn't necessarily mean you own the data.

The storage limitations principle

I'm understanding the storage limitations principle as: When to delete what data. So you're not allowed to keep every data forever. So time to cleanup, actual timelines for how long you're allowed to keep data is depending on rules that may or may not exist in the case of your special case. But we do know we need to cleanup.

So let's say we have an order micro service. It keeps tracks of the orders that have been placed. Orders that have been placed successful have to be stored for two years after fulfillment for warranty reasons. Orders that have been cancelled should be stored for no longer than 3 months.

Okay some clear understanding, but our micro service is not only responsible for storing the orders, it's also responsible for cleaning up orders. But this is not only place where we need to cleanup the data, it's everywhere, don't forget we have 80+ databases. So each micro service needs to have a way to handle the cleanup, preferably in a similar way. The micro service knows everything about it's own data, so it's knows everything about when to cleanup what data as well.

Oh wait. To stay independent our order micro service has some pieces of the customer data, owned by the customer micro service as well. How should I clean this up? The cleanup of customer data has it's own set of rules, known by the customer micro services. But does the order micro service need to cleanup this as well? Yes it does. However, this doesn't have to be so hard, because we filled our order database with partial customer data from events like CustomerCreated and CustomerUpdated. Why wouldn't we remove the customer data from our order database by reacting to CustomerDeleted events? Sounds like a plan to me. However it would limit the way you cleanup your data. You won't be able to use simple scheduled SQL Job with T-SQL to delete data from tables, because that won't fire events.

Some companies chose for the Event Sourcing pattern, how would we be able to cleanup data in the event timeline?

This is some a complete different piece of cake compared to have a monolith with a single database. I think it's important to provide the development teams some patterns and tooling that can be used by each micro service for easy implementation of data cleanup rules.

The right to data portability

The subject has the right to receive all the personal data concerning him or her in a commonly used and machine-readable format. But this personal data may be living in 100s of databases, some being relational, other being document databases. This means it will be hard to write one application which will gather all the data from the different databases. Actually, every time a new micro service is created you don't want any changes in the data portability service.

So this sounds like every micro service should be responsible for providing the personal data for the data portability. If you provide some message contracts each micro service needs to adhere to for providing personal data regarding the data portability. Combine this with a kind of registration so that each micro service involved in personal data can register itself to the data portability micro service.

In itself this isn't really a difficult feature to tackle. It can be implemented by following a pattern to gather data from multiple micro services to combine this in a single set of data.

Micro Services are hard

Yes, we have to be honest, a micro services architecture make some things really hard. Just a reminder, why did we choose for a micro services architecture again?

Independently develop and deploy services
Speed and agility
Better code quality
Code created/organized around business functionality
Increased productivity
Easier to scale
Freedom (in a way) to choose the implementation technology/language