= SOP Pseudonymised Tags: [[SOP_Category]] == Tools - [[https://www.openpseudonymiser.org/|OpenPseudonymiser]] == Sources - [[https://ico.org.uk/media/1061/anonymisation-code.pdf|ICO Anonymisation Code of Practice]] - [[http://webarchive.nationalarchives.gov.uk/20130502102046/http://www.connectingforhealth.nhs.uk/systemsandservices/pseudo/ref1term.pdf|Connectiong for Health Pseudonymisation Implementation Project]] == General Principle Individuals should not be identifiable or re-identifiable from the data that we publish, release to other organisations (including University of Leicester employees) or use internally unless absolutely necessary, for example to mail the individuals or search their medical records. Therefore, we need to remove identifiable fields from data wherever possible. == Identifiable Data === Names and addresses Name and address fields should not be released unless required for the specific purpose of the data, such as mailing the participant. === Postcode Post codes could possibly lead to re-identification depending on the number of households within the post code and the other fields released along with the post code. According to the [[https://ico.org.uk/media/1061/anonymisation-code.pdf|ICO Anonymisation Code of Practice]] the number of households per post code are: - Full postcode = approx 15 households (although some postcodes only relate to a single property) - Postcode minus the last digit = approx 120/200 households - Postal sector = 4 outbound digits + 1 inbound gives approx 2,600 households - Postal district = 4 outbound digits approx 8,600 households Therefore, a postcode should be truncated to increase its anonymity. ==== Alternatives to Postcodes If possible, an alternative to postocde should be used. These include: - Townsend deprivation Index - LSOA - Lower Layer Super Output Area For both see [[Townsend Deprivation Index]] === Date of Birth Date of birth should not be released, unless necessary, especially when combined with other fields such as postcode and gender. Instead use the fields: - Age - Year of Birth Both of these values should also be banded where possible. For example, age=40-45. === Identifiers NHS Numbers and UHL System Numbers should not be released, unless they are required - for example, for linking to UHL data warehouse data or accessing imaging files. All participants must have a unique [[Participant Study Identifier]] that is only linked to other identifiers within the LCBRU IT systems. This can be released. === Free Text and Notes Fields Free text note fields should not be released as there is a possibility that the text will contain personal identifiable information. It is possible to de-identify free text with a high degree of success. However, currently this is never 100% effective and the LCBRU does not have a tool to carry out this task. As an alternative, coded information should be extracted from the free text using computer-based and manual processes. [[BackLinks]]