This bulletin provides a high-level discussion of the de-identification of personal information under the recently proposed federal Consumer Privacy Protection Act (CPPA). This bulletin will be of interest to privacy officers, data administrators, service providers, and organizations that engage service providers to process personal information.
On November 17, 2020 the Government of Canada tabled Bill C-11, an Act to enact the Consumer Privacy Protection Act and the Personal Information and Data Protection Tribunal Act and to make consequential and related amendments to other Acts, or Digital Charter Implementation Act, 2020 (Act), which aims to protect the privacy of Canadians while promoting data-driven innovation. The Act would repeal the Personal Information Protection and Electronic Document Act, SC 2000, c 5 (PIPEDA) and replace it with the CPPA.
Unlike PIPEDA, the CPPA expressly recognizes the concept of the de-identification of personal information.
1. What is De-identification of Personal Information?
Under the CPPA, de-identification of personal information is the processing of a dataset containing personal information to create a new dataset (in this bulletin, a 'de-identified dataset') where the original individuals can no longer be identified, in reasonably foreseeable circumstances, from the de-identified dataset alone or in combination with other information. For example, de-identification would require at least the removal of all individual names and other similar identifiers from the original dataset (removal of direct identifiers), and often requires more substantial processing of the original dataset (removal of indirect identifiers).
Organizations frequently use de-identification as a means to protect the privacy of individuals while still extracting something of value from the personal information. For example, a health researcher might share de-identified patient data with other health researchers. A bank might share de-identified personal information with its marketing department. A website might share de-identified personal information with advertisers.
But there is usually at least some risk that individuals might be re-identified from a de-identified dataset (in this bulletin, re-identification).
Data administrators once believed that sufficient de-identification could by achieved by simply removing names and other direct identifiers, and perhaps by making small modifications to the remaining data (such as generalizing a birth date to only the year of birth, or providing only the first three characters of a postal code). But multiple studies have shown that it is often unexpectedly possible to re-identify individuals from purportedly de-identified datasets, typically with the use of other data.
As one example, in the mid-1990s the Massachusetts State Group Insurance Commission publicly released health insurance records for state employees, after removing names and other explicit identifiers. The Governor assured the public that privacy was protected by the deletion of these identifiers. However, by combining publicly available voter rolls with the de-identified dataset, a researcher was able to quickly identify the Governor’s health records, including diagnoses and prescriptions.
According to a landmark study, 87% of Americans can be uniquely identified from their ZIP code, date of birth (including year), and sex. According to another important study, more than 80% of users of a flagship subscription DVD rental service could be uniquely identified by when and how they rated any three movies they had rented.
The emerging field of de-identification science suggests that re-identification of individuals, rather than being hard, is sometimes surprisingly easy. Robust de-identification turns out to be difficult, requiring both experience and sophisticated knowledge. While de-identification processes are rapidly improving, de-identification cannot be guaranteed. There is often at least some residual risk that re-identification might be possible, if not with techniques and data that are available today, then perhaps with techniques and data that might become available in the future. It is difficult for organizations to predict the type and amount of external information that an adversary will be able to access, and that difficulty may result in some residual re-identification risk - even for the best intentioned de-identification processes.
2. De-identification of Personal Information under PIPEDA
PIPEDA does not expressly address the de-identification of personal information, but the concept of de-identification is implicit and PIPEDA contemplates anonymization as one way to deal with personal information that is no longer required to serve the identified purpose.
Under PIPEDA (and under the CPPA), personal information is defined as information about an identifiable individual. If an individual cannot be identified from a dataset (such as a de-identified dataset), then the dataset is not personal information and is not regulated by PIPEDA.
But what if there is some small risk that an individual might be identified from a dataset?
Canadian courts have stated that the definition of personal information must be given a broad and expansive interpretation, and have found that information will be about an identifiable individual “where there is a serious possibility that an individual could be identified through the use of that information, alone or in combination with other information” (in this bulletin, the ‘serious possibility’ test for personal information). Similarly, the Office of the Privacy Commissioner of Canada (OPC) has held that personal information that has been de-identified does not qualify as anonymous information if there is a serious possibility of linking the de-identified data back to an identifiable individual (in this bulletin, the ‘no serious possibility’ de-identification test).
The ‘no serious possibility’ de-identification test recognizes that, under PIPEDA, the risk of re-identification does not need to be completely eliminated for a de-identified dataset to cease to be personal information.
If the CPPA is adopted in its current form, then, for consistency with the definition of de-identification, the ‘serious possibility’ test should become the ‘reasonably foreseeable circumstances’ test, so that information will be about an identifiable individual where there are reasonably foreseeable circumstances under which an individual could be identified through the use of that information, alone or in combination with other information (paraphrasing the ‘serious possibility’ test).
In the absence of express provisions in PIPEDA relating to de-identification, there are open issues. For example, does an organization need an express or implied consent to create a de-identified dataset from personal information? To collect, use or disclose a de-identified dataset? Are there any rules or standards governing the de-identification process? Are there any restrictions on what an organization is permitted to do with a de-identified dataset? Are there any restrictions on what a recipient of a de-identified dataset is permitted to do?
3. De-identification of Personal Information Under the CPPA
The definition of personal information remains unchanged under the CPPA.
The CPPA defines de-identification as follows:
“de-identify means to modify personal information — or create information from personal information — by using technical processes to ensure that the information does not identify an individual or could not be used in reasonably foreseeable circumstances, alone or in combination with other information, to identify an individual.”
The CPPA would replace the ‘no serious possibility’ de-identification test under PIPEDA with the ‘no reasonably foreseeable circumstances’ test. Note that the CPPA does not expressly address whether de-identified information is, or is not, personal information. (In contrast, the GDPR provides that the GDPR does not apply to de-identified information.)
(Note that de-identification under the CPPA is not the same as de-identification under Quebec’s Bill 64, An Act to modernize legislative provisions as regards the protection of personal information. Bill 64 provides that personal information is de-identified if it no longer allows the individual to be directly identified, requiring the removal of direct identifiers. In Bill 64, the concept of “anonymize” is closer in spirit to the CPPA concept of de-identification, as anonymized data requires that the individual be irreversibly no longer identifiable, both directly and indirectly, thus requiring the removal of both direct identifiers and indirect identifiers. For more information about Bill 64’s proposals on de-identification, see our bulletin: Technological and Legal Overview of the Concepts of “De-identified” and “Anonymized” Information under Bill 64)
Section 20 of the CPPA provides that knowledge and consent are not required to de-identify personal information:
“An organization may use an individual’s personal information without their knowledge or consent to de-identify the information.”
Section 74 of the CPPA specifies a general standard for measures used by an organization to de-identify personal information:
“An organization that de-identifies personal information must ensure that any technical and administrative measures applied to the information are proportionate to the purpose for which the information is de-identified and the sensitivity of the personal information.”
As a consequence, an organization must ensure that the de-identified dataset: (1) does not identify an individual (removal of direct identifiers); and (2) could not be used in reasonably foreseeable circumstances, alone or in combination with other information, to identify an individual (removal of indirect identifiers), in each case using measures that are proportionate to the purpose for which the personal information is de-identified and the sensitivity of that personal information.
The Governor in Council has general regulation making authority under the CPPA, but does not have express authority to make regulations establishing requirements for the means used to de-identify personal information. In contrast, the Ontario Personal Health Information Protection Act (Ontario PHIPA) grants express authority to make regulations “governing the de-identification of personal health information and the collection, use and disclosure of de-identified information” (no such regulations have been enacted).
Organizations that de-identify personal information should consult with experienced experts in the field of data de-identification and re-identification. Even then, organizations should recognize the real risk that an attacker might use an unanticipated approach or a previously unknown dataset to launch a reidentification attack. Under the CPPA, with the benefit of 20/20 hindsight, the OPC or the new Personal Information and Data Protection Tribunal (which Bill C-11 would also create) may nonetheless decide that the unanticipated approach or the previously unknown dataset should have been reasonably foreseeable by the organization and thus that the dataset was never de-identified for the purposes of the CPPA.
4. Legal Effect of De-identification Under the CPPA
Once a de-identified dataset is created, can an organization then collect, use and disclose that de-identified dataset for any purpose, without notice or consent? The answer is likely yes, subject to certain specified exceptions and the prohibition on re-identification discussed below.
Unlike some provincial personal health information laws, such as Alberta’s Health Information Act, (Alberta HIA), the CPPA does not expressly say that an organization can collect, use and disclose a de-identified dataset for any purpose (except as expressly prohibited). Rather, the CPPA specifies three situations in which an action can only be taken if the personal information is de-identified.
- An organization is permitted to use an individual’s personal information without their knowledge or consent for the organization’s internal research and development purposes, if the information is de-identified before it is used.
- It appears to be the intention that an organization is permitted to use a de-identified dataset for research and development purposes, with or without the consent of the individuals. However, does the provision also prevent an organization from using personal information for research and development purposes, with an informed consent, but without de-identification?
- What does “internal research and development” mean? Does it extend to such activities as developing customer profiles for marketing purposes and, if so, how can that be done from a de-identified dataset?
- An organization that is a party to a prospective business transaction may use and disclose an individual’s personal information without their knowledge or consent if: (a) the information is de-identified before it is used or disclosed and remains so until the transaction is completed; and (b) certain other conditions are met.
- An organization may disclose an individual’s personal information without their knowledge or consent if: (a) the personal information is de-identified before the disclosure is made; (b) the disclosure is made for a socially beneficial purpose; and (c) the disclosure is made to an organization specified in the CPPA or in its regulations.
The above provisions introduce ambiguity about the extent to which the CPPA might still apply to de-identified information. Does the existence of these three specific conditionally-permitted uses of de-identified datasets imply that an organization is not otherwise permitted to collect, use or disclose de-identified datasets without the consent of the original individuals?
We would expect that the answer should be no. A properly de-identified dataset appears not to be personal information and should not be regulated by the CPPA. Although the CPPA does not expressly authorize the collection, use and disclosure of de-identified datasets for other purposes, the CPPA does not expressly prohibit the collection, use and disclosure of de-identified datasets for other purposes. The specific conditionally-permitted uses of de-identified datasets described in the CPPA should be viewed as limiting the right to use de-identified datasets only in the specific situations described in those provisions, and should not imply any general prohibition on the use of de-identified datasets in other situations.
However, we would nonetheless prefer that legislators clarify the intent and application of the de-identification provisions, for example by including an express statement that an organization can collect, use and disclose de-identified datasets without notice or consent, subject to these three exceptions and the prohibition on re-identification discussed below.
Indeed, given the ambiguity in the legislation, our answer above is not without some risk. Unlike the Alberta HIA, the CPPA does not say that say that an organization can collect, use and disclose a de-identified dataset for any purpose (except as expressly prohibited). Moreover, in its Proposals to modernize the Personal Information Protection and Electronic Documents Act (part of the federal government’s Canada’s Digital Charter initiative), the Government of Canada states:
“… That said, a risk-based approach, in which de-identified information could be defined and its use allowed in certain specified circumstances, with penalties for re-identification, could be taken to both address privacy concerns and enable innovation. …”
This language, and other similar statements in the Proposals document, suggest that the government, at the time, was contemplating an approach that limited the rights to use de-identified data to specific circumstances.
5. Service Providers
Does the CPPA right to de-identify personal information without consent (or knowledge) also apply to a service provider that might wish to de-identify personal information and use that de-identified information for its own purposes (where the contract with the controlling organization permits or does not prohibit that de-identification)? The answer is not entirely clear.
The right to de-identify personal information is not expressly limited to controlling organizations. However, the right to de-identify personal information is in Part 1 of the CPPA, which does not apply to service providers (at least, in respect of personal information that is transferred to the service provider). On the other hand, the right to de-identify personal information is a clarification of, or an exception to, other rights in Part 1 which also do not apply to service providers.
While there are some arguments available to a service provider, there remains a real risk that the Commissioner will construe the de-identification right narrowly, and require that a service provider have either the consent of the controlling organization or the consent of the individuals.
Service providers will typically want to include an express right to de-identify personal information in their service contracts with controlling organizations.
For more information about CPPA issues affecting service providers, see our bulletin: The New Consumer Privacy Protection Act - Key Terms For Service Providers.
6. Prohibition on Re-identification
Section 75 of the CPPA prohibits re-identification of individuals from de-identified datasets:
An organization must not use de-identified information alone or in combination with other information to identify an individual, except in order to conduct testing of the effectiveness of security safeguards that the organization has put in place to protect the information.
As discussed above, there is usually at least some risk that individuals can be re-identified from a de-identified dataset. As a consequence, the right to create and use de-identified information should be coupled with a strong legislative prohibition on re-identification.
The legislative prohibition on re-identification, while necessary, does not provide absolute protection. Re-identification can happen in the shadows, on the dark web, in other jurisdictions. It will not be easy for organizations or regulators to determine if a de-identified dataset has been re-identified by another actor. The OPC may not have sufficient personnel or sufficient budget to properly investigate and prosecute more than a few bad actors, and the OPC will face additional challenges proceeding against actors that are located outside Canada. The conditions precedent to initiating litigation (discussed below), may preclude class action litigators from policing the prohibition. In the end, it is possible that the prohibition on re-identification may be constrained by the practical difficulties of enforcement.
There are, perhaps, some drafting issues with the CPPA prohibition on re-identification.
- In situations where the collection of personal information is permitted by express or implied consent, perhaps the re-identification of the subject from the de-identified information should also be permitted? It is only a different means of collecting the information.
- It would be advantageous if the OPC was permitted to approve exceptions to the prohibition on re-identification. For example, imagine that health care researchers working with a de-identified dataset determined that some individuals have genetic markers suggesting a dangerous but treatable health risk. It would be in the best interests of those individuals to be notified of that risk so that they are able to seek treatment. Ideally, there should be a mechanism by which the OPC could approve re-identification of the individuals in such circumstances.
- It would also be advantageous for the Governor in Council to have the authority to make regulations to specify persons and circumstances in which re-identification is permitted.
There is a related drafting issue. Section 6(2) of the CPPA expressly provides that the CPPA applies in respect of personal information that is collected, used or disclosed internationally by an organization. Because de-identified information is likely no longer personal information, it would be preferable if section 6(2) was amended to also expressly provide that the CPPA applies in respect of de-identified information that is created from Canadian personal information, regardless of whether the de-identification occurs inside or outside Canada, and regardless of whether any re-identification occurs inside or outside Canada.
7. Offences and Damages
Section 125 of the CPPA provides that every organization that knowingly contravenes the prohibition on re-identification is: (a) guilty of an indictable offence and liable to a fine not exceeding the higher of $25,000,000 and 5% of the organization’s gross global revenue in its previous financial year; or (b) guilty of an offence punishable on summary conviction and liable to a fine not exceeding the higher of $20,000,000 and 4% of the organization’s gross global revenue in its previous financial year.
An individual who is affected by an organization’s contravention of this prohibition may have a cause of action against the organization for damages for loss or injury, if: (a) the Commissioner makes a finding that the organization has contravened the CPPA and (i) the finding is not appealed, or (ii) the Personal Information and Data Protection Tribunal ('Tribunal') dismisses an appeal of the finding; or (b) the Tribunal makes a finding that the organization has contravened the CPPA; or (c) the organization is convicted of an offence under section 125 of the CPPA.
The conditions that must be met before the commencement of litigation may discourage class action litigation.
The CPPA provisions relating to de-identification and re-identification are a much needed update to Canadian federal privacy laws, but there remain areas where additional clarity would be helpful, for example:
- For consistency with the definition of de-identified information, the definition of personal information should be amended to say “information about an identifiable individual, but not including information where there are no reasonably foreseeable circumstances under which an individual could be identified through the use of that information, alone or in combination with other information”.
- The CPPA, like the GDPR, should state that de-identified information is not personal information, and that the CPPA does not apply to de-identified information (except where expressly stated to apply).
- The CPPA should allow the Governor in Council to make regulations governing the de-identification of personal information, and the collection, use and disclosure of de-identified information. The standard for de-identification is necessarily general. It is consequentially difficult for an organization to know whether it has achieved that standard. It would be useful if the CPPA allowed the Governor in Council to make regulations that provided greater specificity. Regulations might follow emerging ISO or NIST standards. Regulations might vary from industry to industry. For example, there might be different detailed requirements for personal health information and for personal financial information.
- The CPPA should be amended to eliminate any ambiguity as to whether an organization can collect, use and disclose a de-identified dataset for any purpose (beyond the specified circumstances specifically restricted or prohibited by the CPPA).
- The CPPA should allow the OPC to authorize re-identification from a de-identified dataset, if the OPC concludes that doing is in the best interest of the affected individuals or the public.
- The CPPA should also grant the Governor in Council the authority to make regulations to specify persons and circumstances in which re-identification is permitted.
- Section 6(2) of the CPPA should be amended to expressly provide that the CPPA applies in respect of de-identified information that is created from Canadian personal information, regardless of whether the de-identification occurs inside or outside Canada, and regardless of whether any re-identification occurs inside or outside Canada.
The views expressed in this bulletin are the personal views of the authors, and do not necessarily reflect the views of Fasken or its clients.