wiki:i2b2 AUG 2013

Context Navigation

Version 13 (modified by Richard Bramley, 12 years ago) ( diff )
--

i2b2 AUG 2013

Program

NLP Workshop

UMLS Ontologies and Ontology Resources

Presentation showing how UMLS resources can be used with NLP to extract information from free text.

NLP has two stages:

Entity Recognition - Identifying important terms within text
Relationship Extraction - linking entities together

Entity Recognition

Three major problems when identifying entities within a text:

Entities are missed
Entities are partially matched - part of the term is matched but another part is missed leading to incomplete information or context. For example, in the term 'bilateral vestibular' only the second word may be matched.
Ambiguous terms - terms that may have two meanings.

Entities are identified by a combination of normalisation and longest term matching.

Normalisation is the process whereby a term is manipulated to produce a form of words that will match a large number of potential matches. The process involves removing noise words, standardising inflections and derivatives (e.g., remove plural), removing punctuation, converting to lower case, and sorting the words into alphabetical order.

In order to extract the most meaning from the text, an attempt is made to try to match the term with the most number of matching words. For example, 'left atrium' as opposed to just 'atrium'.

Types of Resources useful for Entity Recognition

There are several types of resource:

Lexical resources - lists of terms with variant spellings, derivatives and inflections, associated with the part of speach to which they refer. These can be either general or include specialist medical terms.
Ontologies - set of entities with relationships between the entities.
Technical resources - set of terms and identifiers used to map a term to an ontology.
Hybrid - A mixture of 1 and 2. They are not strictly speaking ontologies as the relationships may not always be true (e.g., a child may not always be a part of the parent). They are useful for finding terms, but should not be used for aggregation.

Lexical Resources

UMLS Specialist Lexicon - Medical and general English
WordNet - General English
LVG Lexical Variant Generation - specialist tool
BioLexicon - EU project. Not as general. Mainly focused on genes.
BioThesaurus - Focused on proteins and genes.
RxNorm - Drug specific.

Ontological Resources

UMLS Semantic Network

Terminology Resources

MetaThesaurus
- Groups terms from many ontologies
- Produces a graph of all the relationships
- Graph is not acyclic and contains contradictions because it reproduces its source ontologies exactly.
- Allows standards to be mapped between.
RxNorm
- Map between many drug lists.
- Map between branded and generic drug names.
MetaMap
- Free with licence agreement
- Based on UMLS MetaThesaurus.
- Parses text to find terms.
- Used in IBM's Watson tool.
- Terms can be translated between various standards, including Snomed.
- Copes with term negation and disambiguation.
TerMine
WhatIzIt

Relationship Extraction

SemRep

Orbit Project

The Orbit Project is the Online Registry of Biomedical Informatics Tools.

Ontology-based De-identification of Clinical Naratives

Presentation showing a method to remove Protected Health Information (PHI) from free text fields, using the Apache cTakes lexical annotation tool.

The normal method for attempting to de-identify free text is to train software to recognise personal information. However, the number of training examples available is usually quite small. This team attempted to reverse the task by training the software to recognise non-PHI data.

Pipeline:

cTakes
Frequency of term in medical journal articles.
Match terms to ontologies. Diseases (etc) named after people can be a problem, but matching terms with more than one word implies that it is not a name. For example, 'Hodgkins Lymphoma' would not match 'Mr Hodgkins'
Remove items from known PHI lists - presumably the person's name and address, etc.

Ontology-based Discovery of Disease Activity from the Clinical Record

Presentation of a project to use NLP to find evidence of disease activity and find its temporal relationship to drug events to identify patients as responders or non-responders for genetic analysis.

This talk put forward a method of using 3 data sets when training the software:

Annotated training set
Known set - a pre-annotated set that is used repeatedly to test the software, but not to train it.
Unknown random set - a random set of a larger set that is used once for testing. The results of the test are manually assesed after the run.

Context Navigation

i2b2 AUG 2013

Program

NLP Workshop

Academic User Group

i2b2 SHRINE Conference

NLP Workshop

UMLS Ontologies and Ontology Resources

Entity Recognition

Types of Resources useful for Entity Recognition

Lexical Resources

Ontological Resources

Terminology Resources

Relationship Extraction

Orbit Project

Ontology-based De-identification of Clinical Naratives

Ontology-based Discovery of Disease Activity from the Clinical Record

Ontology Normalisation of the Clinical Narrative

Ontology Concept Selection

Active Learning for Ontology-based Phenotyping

Conclusion

Academic User Group

Genomic Cell

SMART Apps

i2b2 Roadmap

Planning for the future

From Genetic Variants to i2b2 using NoSQL database

Extending i2b2 with the R Statistical Platform

Integrated Data Repository Toolkit (IDRT) and ETL Tools

i2b2 SHRINE Conference

SHRINE Clinical Trials (CT) Functionality and Roadmap

SHRINE National Pilot Lessons Learned

SHRINE Ontology Panel

University of California Research Exchange (UC ReX)

Preparation for Patient-Centred Research

Case Study: Improve Care Now

Download in other formats: