Software Engineering

When GDPR (DSGVO) meets Logging (in Java & Spring)

General Data Protection Regulation (GDPR, Ger: Datenschutz-Grundverordnung (DSGVO)) regulates how data has to be processed. It obviously also touches the ones who implement the applications that process data: developers. Even something simple as writing logs gets involved. Let’s find out how.

Rough tech-stack

  • Java 11
  • Spring Boot
  • Elasticsearch & Kibana
  • Slf4j & Lombok

Clash of logs & data protection

Stashing logs was simplified by the use of logging platforms like the Elastic stack offers. Which I was using when coming across the clash of logging and GDPR. With Slf4j, one of the most known logging libraries, writing logs became quite simple. For instance, when a new customer should be processed:

public ResponseEntity addCustomer(@RequestBody Customer customer) {
   log.info("New customer received: {}", customer);
   customerService.processNewCustomer(customer);
   return ResponseEntity.ok().build();
}

Slf4j ist automatically using the toString()-Method of the Customer object, see line 2. The printed log could look like this:
CustomerController : New customer received: Customer(type=B2C, ident=014004, email=john.doe@esentri.com)

If logs on level INFO get aggregated by the logging platform of your choice, this log will store the email address of John Doe and we could have just violated the GDPR by a single line of code.

Let’s fix this!

Now, we want to hide log-content depending on the set and active logging level. Like every time when writing code, multiple solutions come into place. After a brief discussion in team, we went for implementing the toString()-method depending on the logging level. Other solutions and their disadvantages are presented later on.

In Customer.java we add a method annotated with @ToString.Include:

@ToString.Include(name = "email")
private String emailForLogs() {
   if (log.isDebugEnabled()) {
   return this.email;
}
   return "HIDDEN FOR LOGS";
}

Now, having set logging level DEBUG or less, the email gets printed to the logs. But having set a higher logging level like INFO for instance, the log will now look like this:
CustomerController : New customer received: Customer(type=B2C, ident=0000014004, email=HIDDEN FOR LOGS)

Of course, logs on level DEBUG mustn’t be persisted. Otherwise, this solution still violates the GDPR.

Why @ToString.Include?

Advantages having chosen this solution:

  • Code in CustomerController.java did not change at all
  • Code in Customer.java is simple and easy to read
  • When setting logging level to DEBUG on production, it allows reading the log from command line. But since your way of storing logs should not get debug-logs (avoiding flood of logs) the log will not be stored. At least, for no longer than two weeks. GDPR is met. 

Other situations could demand other solutions. That’s why I will list a bunch of considered ways and why we didn’t go for them:

Static helper class

A static method which returns a manipulated Customer and gets called like:

log.info("Customer received: {}", LogHelper.hideEmail(customer));

Downsides:

  • In every log of a customer the LogHelper.java has to be used which might be missed
  • Add of extra class and more code

Exclude email from toString()

Annotate the email in Customer.class like:

@ToString.Exclude
private String email

Downsides:

  • Even on level DEBUG the email will not be printed
  • Whenever toString() a Customer, the email will not be printed

Logging field by field

Customers information could be logged field by field like

log.info("New customer of type '{}' and ident '{}' received.“, customer.getType(), customer.getIdent());

Downsides:

  • Again, even on level DEBUG the email will not be printed
  • The real life problem may have way more fields which are important to log and so it blows the code

Would be interesting to know, if some of you had the same issue and how you solved it. On my GitHub-repository, the small project containing the chosen solution is stored.