Changes between Version 11 and Version 12 of i2b2 AUG 2013
- Timestamp:
- 06/24/13 11:53:40 (11 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
i2b2 AUG 2013
v11 v12 52 52 Entities are identified by a combination of normalisation and longest term matching. 53 53 54 Normalisation is the process whereby a term is manipulated to produce a form of words that will match a large number of potential matches. The process involves removing noise words, standardising inflections and derivatives (e.g., remove plural), converting to lower case, and sorting the words into alphabetical order.54 Normalisation is the process whereby a term is manipulated to produce a form of words that will match a large number of potential matches. The process involves removing noise words, standardising inflections and derivatives (e.g., remove plural), removing punctuation, converting to lower case, and sorting the words into alphabetical order. 55 55 56 56 In order to extract the most meaning from the text, an attempt is made to try to match the term with the most number of matching words. For example, 'left atrium' as opposed to just 'atrium'. … … 108 108 109 109 === [=#NLP2 Ontology-based De-identification of Clinical Naratives] === 110 111 Presentation showing a method to remove Protected Health Information (PHI) from free text fields, using the Apache cTakes lexical annotation tool. 112 113 The normal method for attempting to de-identify free text is to train software to recognise personal information. However, the number of training examples available is usually quite small. This team attempted to reverse the task by training the software to recognise non-PHI data. 114 115 Pipeline: 116 117 1. cTakes 118 1. Frequency of term in medical journal articles. 119 1. Match terms to ontologies. Diseases (etc) named after people can be a problem, but matching terms with more than one word implies that it is not a name. For example, 'Hodgkins Lymphoma' would not match 'Mr Hodgkins' 120 1. Remove items from known PHI lists - presumably the person's name and address, etc. 121 110 122 === [=#NLP3 Ontology-based Discovery of Disease Activity from the Clinical Record] === 111 123 === [=#NLP4 Ontology Normalisation of the Clinical Narrative] ===