Information security technology - Guide for evaluating the effectiveness of personal information de-identification
1 Scope
This document provides the guide for classifying and evaluating the effectiveness of personal information de-identification.
This document is applicable to personal information de-identification activities as well as personal information security management, supervision and evaluation.
2 Normative references
The following documents contain provisions which, through reference in this text, constitute provisions of this document. For dated references, only the edition cited applies. For undated references, the latest edition (including any amendments) applies.
GB/T 25069-2022 Information security techniques - Terminology
GB/T 35273-2020 Information security technology - Personal information security specification
GB/T 37964-2019 Information security technology - Guide for de-identifying personal information
3 Terms and definitions
For the purposes of this document, the terms and definitions given in GB/T 25069-2022, GB/T 35273-2020, GB/T 37964-2019 and the following apply.
3.1
personal information
all kinds of information related to identified or identifiable natural persons, recorded electronically or otherwise
Note: It does not include anonymized information.
[Source: GB/T 35273-2020, 3.1, modified]
3.2
personal information subject
natural person identified by or connected to personal information
[Source: GB/T 35273-2020, 3.3]
3.3
de-identification
process of processing personal information in technical terms so that the personal information subject cannot be identified or connected without additional information
[Source: GB/T 35273-2020, 3.15]
3.4
microdata
structured dataset in which each record (row) corresponds to a personal information subject and each field (column) in the record corresponds to an attribute
[Source: GB/T 37964-2019, 3.4]
3.5
identifier
one or more attributes of the microdata, which can realize the unique identification of personal information subject
Note: Identifiers are classified into direct identifiers and quasi-identifiers.
[Source: GB/T 37964-2019, 3.6]
3.6
direct identifier
attribute of microdata that can be individually used to identify the personal information subject in specific context
Note: See Annex A for common direct identifiers.
[Source: GB/T 37964-2019, 3.7]
3.7
quasi-identifier
attribute of microdata that, when combined with other attributes, can be used to uniquely identify the personal information subject
Note: See Annex B for common quasi-identifiers, and Annex C for identification of quasi-identifiers.
[Source: GB/T 37964-2019, 3.8]
3.8
re-identification
process of re-associating a de-identified dataset to the original personal information subject or a personal information subject
[Source: GB/T 37964-2019, 3.9]
3.9
completely public sharing
release of data, usually release of data directly to the public via the Internet, with data being difficult to recall once after being released
[Source: GB/T 37964-2019, 3.12]
3.10
controlled public sharing
constraining the use of data through data use agreement
[Source: GB/T 37964-2019, 3.13]
3.11
enclave public sharing
sharing of data within a physical or virtual enclave, out of which data cannot flow
[Source: GB/T 37964-2019, 3.14]
3.12
re-identification risk
identifiability
probability that the personal information subject can be identified from the data
3.13
equivalence class
collection of record rows in microdata where all quasi-identifiers have the same attribute value
3.14
acceptable risk threshold
critical value set for re-identification risk
Note: When the re-identification risk is greater than this value, mitigation measures (including de-identification) and emergency measures shall be taken to realize that the risk is within a controllable range.
4 Classification of effectiveness of personal information de-identification
Based on whether the data can be used to directly identify the personal information subject, or to what probability it can be used to identify the personal information subject, the identifiability of personal information is classified into 4 classes, as detailed in Table 1, to distinguish the effectiveness of personal information de-identification.
Table 1 Classification of personal information identifiability into 4 classes
Classification Classification basis
Class 1 Contain direct identifiers, which can be used to directly identify the personal information subject in specific context
Class 2 Contain no direct identifiers but quasi-identifiers, with the re-identification risk higher than or equal to the acceptable risk threshold
Class 3 Contain no direct identifiers but quasi-identifiers, with the re-identification risk lower than the acceptable risk threshold
Class 4 Contain no identifiers
Foreword i
Introduction ii
1 Scope
2 Normative references
3 Terms and definitions
4 Classification of effectiveness of personal information de-identification
5 Process for evaluating the effectiveness of personal information de-identification
6 Evaluation practice
6.1 Preparation for evaluation
6.2 Qualitative evaluation
6.3 Quantitative evaluation
6.4 Formation of evaluation conclusion
6.5 Communication and negotiation
6.6 Evaluation process document management
Annex A (Informative) Examples of direct identifiers
Annex B (Informative) Examples of quasi-identifiers
Annex C (Information) Quasi-identifier identification
Annex D (Informative) Example of de-identification effectiveness evaluation based on K-anonymity model
Bibliography