The European Union’s General Data Protection Regulation (EU GDPR) came into force on 25 May 2018. Many of us remember the influx of marketing emails around this time, with companies updating their privacy policies and asking for the consent of around 450 million Europeans to continue using their personal data. An often misunderstood participant of this compliance quest is log data—a source potentially rich in protected personal data. So, how does the GDPR apply to an organization’s log data? And how does an organization’s log data go from risk-abundant liability to unlikely hero?
What is the GDPR?
The GDPR is a legal regulation implemented into law by all member states of the European Union. When it was introduced in 2018, it was one of the most far-reaching and radical data protection and privacy regulations ever created. It enshrines into law the rights of EU citizens to the protection of their personal data.
This Regulation protects fundamental rights and freedoms of natural persons and in particular their right to the protection of personal data.
It has since influenced and become the benchmark for many other data protection and privacy laws, most notably the United Kingdom’s own version of the GDPR (which came into effect after the UK exited the EU) and the California Consumer Privacy Act (CCPA) in the United States.
Who does it apply to?
Although the GDPR is considered a European law, its reach is far greater than just Europe. It exhibits extraterritorial powers, meaning that the regulation also applies to countries, governments, and organizations outside of the EU. For example, if a company based in the United States collects the personal data of an EU citizen, then that company must comply with the GDPR or else risk the established penalties. Such penalties for noncompliance can be as much as €20 million or 4% of annual global turnover, whichever is higher.
In outlining its scope, the GDPR defines three personas:
The data subject is the natural person whose personal data is being held. The data controller and processor are entities that collect, store, or manipulate the personal data of the subject. Organizations, regardless of legal jurisdiction, that control or process the personal data of EU citizens are compelled to understand and follow the GDPR.
What constitutes personal data?
Although the term personal data is left deliberately vague, to keep the definition as wide as possible, there are some concrete examples given in the GDPR text.
(1) ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
Any information which could identify an individual is considered personal data. Of particular significance to log data, IP addresses are included in this definition.
It is especially important that data controllers determine and document the lawfulness and necessity of collecting any such personal data.
GDPR and log data
The GDPR is composed of 99 articles, each prescribing the legal obligations and rights of data controllers, processors, and subjects. Of these, there are four pertinent articles related to log data. We will discuss the potential compliance problems that arise from each article particular to log data.
Article 5 - Principles
Article 5 of the GDPR lays out the fundamental principles that should be upheld by data controllers and processors when dealing with personal data.
Data controllers must collect the least amount of data on a subject for their purposes.
[Personal data shall be:] (c) adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed (‘data minimisation’);
It is up to each controller to determine and document what data is required and the reasoning behind it. This is a good security practice in itself—decreasing the amount of data decreases the attack surface of an organization.
Logs include a large amount of data, some of which can be classified as personal data. This is especially true given the huge volume of log data that is collected in some organizations. For example, IP addresses and geolocation data, when traceable to a natural person, are considered personal data. While it is often crucial to capture and store IP addresses (you may be legally required to store this information to comply with other regulations), there may be instances where filtering data out of the log is more appropriate. For example, redacting email addresses or phone numbers.
Storage limitation, or retention
The GDPR enforces retention periods for personal data, where the data can only be legally stored during a certain timeframe.
[Personal data shall be:] (e) kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed … (‘storage limitation’);
Exceeding that timeframe without reason, even accidentally, constitutes a breach of the regulation. Again, deleting old, useless data is both a good security practice and a practical way to decrease storage space and the associated costs.
Logs are often archived for extended periods with minimal viewing or auditing. They might never be deleted, long forgotten on tape drives or hard disks. Missing a centralized overview of an organization’s log data could run it afoul of the GDPR.
Integrity and confidentiality
Maintaining the integrity and confidentiality of personal data is paramount. It is perhaps the single most important strain that runs through the entire GDPR text.
[Personal data shall be:] (f) processed in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures (‘integrity and confidentiality’).
Breaking down these terms, confidentiality means only those who are authorized should be able to see the data, while integrity means only those who are authorized should be able to edit the data --two fundamental information security principles.
In a large IT infrastructure environment, huge volumes of log data can be produced. Oftentimes, this log data is transmitted unencrypted through a network or stored on a computer or server with little access control. Where logs include personal data, they must remain secure.
Article 30 - Record of processing activities
Article 30 of the GDPR compels organizations to audit and record how personal data is being used.
Each controller and, where applicable, the controller’s representative, shall maintain a record of processing activities under its responsibility.
This includes a trail of where the personal data is transmitted and stored, alongside who has accessed it and for what reasons.
Here, logs are both a problem and a solution. A problem, in that they can still hold personal data in their own right, and must be treated carefully in line with the regulation. But also a solution, whereby log data can be mobilized to help create an audit trail of how personal data has been processed.
Article 32 - Security of processing
Article 32 expands on the fundamental principle of integrity and confidentiality set out in Article 5.
[T]he controller and the processor shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk….
Further emphasis is placed on the encryption of personal data as a means of compliance. In addition, anonymization and pseudonymization are stated as examples of best practice for handling personal data wherever possible.
[including] (a) the pseudonymisation and encryption of personal data;
Anonymization and pseudonymization remove some of the strict controls required to process personal data. Anonymization removes all personal data whereas pseudonymization extracts and segregates a subset of the personal data such that the data subject can no longer be identified. In the case of pseudonymization, the subset of the personal data must be stored separately and securely from the rest. Logs, which can harvest personal data as a matter of course—both purposefully and erroneously—should be carefully inspected and, where necessary, anonymized in some way.
This article also includes provisions to maintain the availability and resilience of systems that process personal data.
[including] (b) the ability to ensure the ongoing confidentiality, integrity, availability and resilience of processing systems and services;
IT infrastructure teams can be informed about the availability (whether a system is working properly for authorized users) and resiliency (whether a system can be restored to an available state effectively) of a system by utilizing log data. Analysis of this log data can provide insights into how systems and services are performing.
Article 33 - Breach notification
This article (and similarly in article 34) compels organizations to report any data breaches that affect personal data.
In the case of a personal data breach, the controller shall without undue delay and, where feasible, not later than 72 hours after having become aware of it, notify the personal data breach to the supervisory authority …
Breaches are generally required to be reported to both the regulatory body and the data subject whose personal data has been affected. Failure to report a data breach can result in penalties being enforced on an organization.
Knowing this, it is imperative to have an overview of an organization’s security. Logs, which record the inner workings of a system, are an essential dataset to provide this visibility. Log data can be collected and centralized in an analysis platform to conduct investigations and determine whether a breach has occurred.
How NXLog can help with GDPR compliance
NXLog is a powerful and highly scalable, multi-platform log collection solution. Choosing NXLog to collect and centralize your organization’s log data is a great step toward ensuring continued GDPR compliance.
- Centralizing log data from many sources
In modern IT environments, the number of devices generating logs can number in the thousands. Operating systems, networked devices, security tools, and applications all generate log data. Collecting and centralizing log data in a single place allows an organization to manage retention periods and ensure the integrity of personal data.
- Third-party integration
NXLog integrates with many third-party tools. Send your organization’s logs securely to a SIEM or archive database with extensive built-in integration modules.
- Encryption via TLS
Using NXLog, log data can be encrypted in transit with TLS. Encryption provides personal data with high levels of security, allowing organizations to retain confidentiality.
- Log filtering and substitution
To minimize the amount of personal data that is collected, NXLog can filter or substitute information in log data. This feature can be used to anonymize or pseudonymize personal data at the log source.
- Secure Raijin database
NXLog integrates directly with Raijin databases. Raijin is a fast, secure, schemaless database that is especially suited for storing log data. It uses encryption to maintain the confidentiality of personal data at rest.
- File integrity monitoring
With NXLog, you can alert your security team when a file or set of files is modified unexpectedly. Using this feature, the integrity of files can be governed, ensuring that the integrity of personal data is protected.
- Email notifications in real time
NXLog can also send email notifications in real time to alert security teams to a problem as it occurs. With a tight 72-hour timescale for reporting a breach, it is especially important to receive information in a timely manner.
Check with your IT infrastructure team today if your organization’s log data is GDPR compliant.