| 36 | |
| 37 | Presentation showing how UMLS resources can be used with NLP to extract information from free text. |
| 38 | |
| 39 | NLP has two stages: |
| 40 | |
| 41 | 1. Entity Recognition - Identifying important terms within text |
| 42 | 1. Relationship Extraction - linking entities together |
| 43 | |
| 44 | ==== Entity Recognition ==== |
| 45 | |
| 46 | Three major problems when identifying entities within a text: |
| 47 | |
| 48 | 1. Entities are missed |
| 49 | 1. Entities are partially matched - part of the term is matched but another part is missed leading to incomplete information or context. For example, in the term 'bilateral vestibular' only the second word may be matched. |
| 50 | 1. Ambiguous terms - terms that may have two meanings. |
| 51 | |
| 52 | ==== Types of Resources useful for Entity Recognition ==== |
| 53 | |
| 54 | There are several types of resource: |
| 55 | |
| 56 | 1. Lexical resources - lists of terms with variant spellings, derivatives and inflections, associated with the part of speach to which they refer. These can be either general or include specialist medical terms. |
| 57 | 1. Ontologies - set of entities with relationships between the entities. |
| 58 | 1. Technical resources - set of terms and identifiers used to map a term to an ontology. |
| 59 | 1. Hybrid - A mixture of 1 and 2. They are not strictly speaking ontologies as the relationships may not always be true (e.g., a child may not always be a part of the parent). They are useful for finding terms, but should not be used for aggregation. |
| 60 | |
| 61 | ==== Lexical Resources ==== |
| 62 | |
| 63 | 1. [[http://lexsrv3.nlm.nih.gov/Specialist/Home/index.html|UMLS Specialist Lexicon]] - Medical and general English |
| 64 | 1. [[http://wordnet.princeton.edu/|WordNet]] - General English |
| 65 | 1. [[http://lexsrv2.nlm.nih.gov/LexSysGroup/Projects/lvg/2012/docs/userDoc/tools/lvg.html|LVG Lexical Variant Generation]] - specialist tool |
| 66 | 1. [[http://www.ebi.ac.uk/Rebholz-srv/BioLexicon/biolexicon.html|BioLexicon]] - EU project. Not as general. Mainly focused on genes. |
| 67 | 1. [[http://pir.georgetown.edu/pirwww/iprolink/biothesaurus.shtml|BioThesaurus]] - Focused on proteins and genes. |
| 68 | 1. [[http://www.nlm.nih.gov/research/umls/rxnorm/|RxNorm]] - Drug specific. |
| 69 | |
| 70 | ==== Ontological Resources ==== |
| 71 | |
| 72 | 1. [[http://semanticnetwork.nlm.nih.gov/|UMLS Semantic Network]] |
| 73 | |
| 74 | ==== Terminology Resources ==== |
| 75 | |
| 76 | 1. [http://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/index.html|UMLS MetaThesaurus] |
| 77 | * Groups terms from many ontologies |
| 78 | * Produces a graph of all the relationships |
| 79 | * Graph is not acyclic and contains contradictions ''because'' it reproduces its source ontologies exactly. |
| 80 | * Allows standards to be mapped between. |
| 81 | 1. [[http://www.nlm.nih.gov/research/umls/rxnorm/|RxNorm]] |
| 82 | * Map between many drug lists. |
| 83 | * Map between branded and generic drug names. |
| 84 | 1. [http://metamap.nlm.nih.gov/|MetaMap]] |
| 85 | * Free with licence agreement |
| 86 | * Based on UMLS MetaThesaurus. |
| 87 | * Parses text to find terms. |
| 88 | * Used in IBM's Watson tool. |
| 89 | * Terms can be translated between various standards, including Snomed. |
| 90 | * Copes with term negation and disambiguation. |
| 91 | 1. [http://www.nactem.ac.uk/software/termine/|TerMine] |
| 92 | 1. [http://www.ebi.ac.uk/webservices/whatizit/info.jsf|WhatIzIt] |
| 93 | |
| 94 | |