Changes between Version 11 and Version 12 of i2b2 AUG 2013


Ignore:
Timestamp:
06/24/13 11:53:40 (11 years ago)
Author:
Richard Bramley
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • i2b2 AUG 2013

    v11 v12  
    5252Entities are identified by a combination of normalisation and longest term matching.
    5353
    54 Normalisation is the process whereby a term is manipulated to produce a form of words that will match a large number of potential matches.  The process involves removing noise words, standardising inflections and derivatives (e.g., remove plural), converting to lower case, and sorting the words into alphabetical order.
     54Normalisation is the process whereby a term is manipulated to produce a form of words that will match a large number of potential matches.  The process involves removing noise words, standardising inflections and derivatives (e.g., remove plural), removing punctuation, converting to lower case, and sorting the words into alphabetical order.
    5555
    5656In order to extract the most meaning from the text, an attempt is made to try to match the term with the most number of matching words.  For example, 'left atrium' as opposed to just 'atrium'.
     
    108108
    109109=== [=#NLP2 Ontology-based De-identification of Clinical Naratives] ===
     110
     111Presentation showing a method to remove Protected Health Information (PHI) from free text fields, using the Apache cTakes lexical annotation tool.
     112
     113The normal method for attempting to de-identify free text is to train software to recognise personal information.  However, the number of training examples available is usually quite small.  This team attempted to reverse the task by training the software to recognise non-PHI data.
     114
     115Pipeline:
     116
     1171. cTakes
     1181. Frequency of term in medical journal articles.
     1191. Match terms to ontologies.  Diseases (etc) named after people can be a problem, but matching terms with more than one word implies that it is not a name.  For example, 'Hodgkins Lymphoma' would not match 'Mr Hodgkins'
     1201. Remove items from known PHI lists - presumably the person's name and address, etc.
     121
    110122=== [=#NLP3 Ontology-based Discovery of Disease Activity from the Clinical Record] ===
    111123=== [=#NLP4 Ontology Normalisation of the Clinical Narrative] ===