wiki:SOP Pseudonymised

Version 3 (modified by Richard Bramley, 8 years ago) ( diff )

--

SOP Pseudonymised

Tags: SOP_Category

Tools

Sources

General Principle

Individuals should not be identifiable or re-identifiable from the data that we publish, release to other organisations (including University of Leicester employees) or use internally unless absolutely necessary, for example to mail the individuals or search their medical records.

Therefore, we need to remove identifiable fields from data wherever possible.

Identifiable Data

Names and addresses

Name and address fields should not be released unless required for the specific purpose of the data, such as mailing the participant.

Postcode

Post codes could possibly lead to re-identification depending on the number of households within the post code and the other fields released along with the post code. According to the ICO Anonymisation Code of Practice the number of households per post code are:

  • Full postcode = approx 15 households (although some postcodes only relate to a single property)
  • Postcode minus the last digit = approx 120/200 households
  • Postal sector = 4 outbound digits + 1 inbound gives approx 2,600 households
  • Postal district = 4 outbound digits approx 8,600 households

Therefore, a postcode should be truncated to increase its anonymity.

Alternatives to Postcodes

If possible, an alternative to postocde should be used. These include:

  • Townsend deprivation Index
  • LSOA - Lower Layer Super Output Area

For both see Townsend Deprivation Index

Date of Birth

Date of birth should not be released, unless necessary, especially when combined with other fields such as postcode and gender. Instead use the fields:

  • Age
  • Year of Birth

Both of these values should also be banded where possible. For example, age=40-45.

Identifiers

NHS Numbers and UHL System Numbers should not be released, unless they are required - for example, for linking to UHL data warehouse data or accessing imaging files.

All participants must have a unique Participant Study Identifier that is only linked to other identifiers within the LCBRU IT systems. This can be released.

Free Text and Notes Fields

Free text note fields should not be released as there is a possibility that the text will contain personal identifiable information. It is possible to de-identify free text with a high degree of success. However, currently this is never 100% effective and the LCBRU does not have a tool to carry out this task.

As an alternative, coded information should be extracted from the free text using computer-based and manual processes.

Error: Macro BackLinks(None) failed
'Environment' object has no attribute 'get_db_cnx'

Note: See TracWiki for help on using the wiki.