i2b2 Database Structure
You can get all of the following from the documentation, but here are pertinent things to where we are...
This is a very bare basics. At this stage we should be primarily interested in the observation_fact table and its relationship with the concept_dimension table within the CRC Cell. And then the correspondence with any ontology tables within the Ontology Cell. Interest in the patient_dimension, visit_dimension and other of the tables within the CRC are a little further down the line.
There can be a number of ontology tables within the Ontology Cell. We know which are ontology tables by an entry in the table_access table.
The QueryTool uses the ontology tables within the Ontology Cell to format queries for submission against the tables within the CRC Cell.
+-------------CRC Cell-------------+ +-----------Ontology Cell---------+ | | | | | +===================+ | | +=====================+ | | | concept_dimension |<<<=========>>>| ontology tables |>: | | +=========^=========+ | | +==========^==========+ : | | | | | | : | | | | | | : | | | | | | : | | +=========^=========+ | | +==========^==========+ : | | | observation_fact | | | | table_access | : | | +===V===========V===+ | | +=====================+ : | | | | | | : | | | | | +-----------------------------:---+ | +==========V========+ | | : | | patient_dimension | | | +------------+ : | +===================+ | |<<<<<<<<<<<| QUERY TOOL |<<<<<<<<: | | | +------------+ | +==========V======+ | | | visit_dimension | | | +=================+ | +----------------------------------+
Ontology Cell: DDL for Table ONYX
-- ORACLE -- CREATE TABLE ONYX ( "C_HLEVEL" NUMBER(22,0) NOT NULL, "C_FULLNAME" VARCHAR2(700) NOT NULL, "C_NAME" VARCHAR2(2000) NOT NULL, "C_SYNONYM_CD" CHAR(1) NOT NULL, "C_VISUALATTRIBUTES" CHAR(3) NOT NULL, "C_TOTALNUM" NUMBER(22,0) NULL, "C_BASECODE" VARCHAR2(50) NULL, "C_METADATAXML" CLOB NULL, "C_FACTTABLECOLUMN" VARCHAR2(50) NOT NULL, "C_TABLENAME" VARCHAR2(50) NOT NULL, "C_COLUMNNAME" VARCHAR2(50) NOT NULL, "C_COLUMNDATATYPE" VARCHAR2(50) NOT NULL, "C_OPERATOR" VARCHAR2(10) NOT NULL, "C_DIMCODE" VARCHAR2(700) NOT NULL, "C_COMMENT" CLOB NULL, "C_TOOLTIP" VARCHAR2(900) NULL, "UPDATE_DATE" DATE NOT NULL, "DOWNLOAD_DATE" DATE NULL, "IMPORT_DATE" DATE NULL, "SOURCESYSTEM_CD" VARCHAR2(50) NULL, "VALUETYPE_CD" VARCHAR2(50) NULL ) ;
These are worth highlighting...
C_HLEVEL: hierarchical level in the tree. 0,1,2 etc
C_FULLNAME: full path within the tree.
C_NAME: in the demo system this is the last part of the path within the tree. Don't understand the size implications between C_FULLNAME and C_NAME.
C_SYNONYM_CD: Y or N regarding a synonym
C_VISUALATTRIBUTES: eg 'LA' would be an active leaf node...
- 1st character: F = Folder C = Container M = Multiple L = Leaf
- 2nd character: A = Active I = Inactive H = Hidden
- 3rd character: E = editable
There is a point discussing the M and H settings on i2b2OntologyCRC-1.
C_TOTALNUM: Not used, but suspect can be used in the latest version (for performance reasons?).
C_BASECODE: the term that describes the ontological concept.
C_METADATAXML: an optional field to store extra information about the concept in xml format. see i2b2 Ontology c_metadataxml Column.
C_FACTTABLECOLUMN: name of the column in the CRC cell observation_fact table that holds the concept code.
C_TABLENAME: name of the lookup table within the CRC cell that holds the concept code and concept path. In the top diagram this would be concept_dimension.
C_COLUMNNAME: the name of the column within the table described by C_TABLENAME that holds the concept path. In the top diagram this would be the column within the concept_dimension table, which in our case is called CONCEPT_PATH. See below.
C_COLUMNDATATYPE: either ‘T’ for text or ‘N’ for numeric and describes the datatype of the concept. Don't understand this fully. Jeff has questions about this.
C_OPERATOR: the SQL operator used in the WHERE clause of the SQL query. It is usually either ‘LIKE’ or ‘=’.
C_DIMCODE: the path that lives in the dimension table and maps to the concept path.
CRC Cell: DDL for Table concept_dimension
The following sentences are from the design document...
The concept_dimension table contains one row for each concept. Possible concept types are diagnoses, procedures, medications and lab tests. The structure of the table gives enough flexibility to store virtually any concept type, such as demographics and genetics data.
-- Oracle -- CREATE TABLE CONCEPT_DIMENSION ( CONCEPT_CD VARCHAR2(50) NOT NULL, CONCEPT_PATH VARCHAR2(700) NOT NULL, NAME_CHAR VARCHAR2(2000) NULL, CONCEPT_BLOB CLOB NULL, UPDATE_DATE DATE NULL, DOWNLOAD_DATE DATE NULL, IMPORT_DATE DATE NULL, SOURCESYSTEM_CD VARCHAR2(50) NULL, UPLOAD_ID NUMBER(38,0) NULL, CONSTRAINT CONCEPT_DIMENSION_PK PRIMARY KEY(CONCEPT_PATH) ) ;
The design document also states that the concept_dimension table has three required columns:
- concept_path: path that delineates the concept’s hierarchy.
- concept_cd: code that represents the diagnosis, procedure, or any other coded value.
- name_char: name of the concept.
CRC Cell: DDL for Table observation_fact
The fact table of the i2b2 star schema: represents the intersection of the dimension tables. Each row describes one observation about a patient made during a visit. Most queries in the i2b2 database require joining the observation_fact table with one or more dimension tables together.
-- ORACLE -- CREATE TABLE OBSERVATION_FACT ( ENCOUNTER_NUM NUMBER(38,0) NOT NULL, PATIENT_NUM NUMBER(38,0) NOT NULL, CONCEPT_CD VARCHAR2(50) NOT NULL, PROVIDER_ID VARCHAR2(50) NOT NULL, START_DATE DATE NOT NULL, MODIFIER_CD VARCHAR2(100) NOT NULL, VALTYPE_CD VARCHAR2(50) NULL, TVAL_CHAR VARCHAR2(255) NULL, NVAL_NUM NUMBER(18,5) NULL, VALUEFLAG_CD VARCHAR2(50) NULL, QUANTITY_NUM NUMBER(18,5) NULL, INSTANCE_NUM NUMBER(18,0) NULL, UNITS_CD VARCHAR2(50) NULL, END_DATE DATE NULL, LOCATION_CD VARCHAR2(50) NULL, CONFIDENCE_NUM NUMBER(18,5) NULL, OBSERVATION_BLOB CLOB NULL, UPDATE_DATE DATE NULL, DOWNLOAD_DATE DATE NULL, IMPORT_DATE DATE NULL, SOURCESYSTEM_CD VARCHAR2(50) NULL, UPLOAD_ID NUMBER(38,0) NULL, CONSTRAINT OBSERVATION_FACT_PK PRIMARY KEY(ENCOUNTER_NUM,CONCEPT_CD,PROVIDER_ID,START_DATE,MODIFIER_CD) ) ;
Eventually we will have detailed knowledge and understanding of all of these. But for the moment, here are some details:
Columns UPDATE_DATE, DOWNLOAD_DATE, IMPORT_DATE, SOURCESYSTEM_CD and UPLOAD_ID are administrative/info columns only. Ignore these.
The following are summary details of the different columns. For further details see the CRC design document on the Documents page.
There are important questions to ask and observations to make here.
|Key||Column Name||Column Definition||Nullable (default=YES)|
|PK||encounter_num||Encoded i2b2 patient visit number||NO|
|patient_num||Encoded i2b2 patient number||NO|
|PK||concept_cd||Code for observation of interest||NO|
|PK||provider_id||Practitioner id or provider id||NO|
|PK||start_date||Starting date-time of observation (mm/dd/yyyy)||NO|
|modifier_cd|| Code for modifier of interest (i.e. “ROUTE”,” DOSE”)|
note that value columns are often used to hold the amounts such as “100” (mg) or “PO”
|instance_num|| Encoded instance number that allows more that one modifier to be provided for each concept_cd.|
Each row will have a different modifier_cd but a similar instance_num.
|valType_cd|| Format of the concept|
N = Numeric
T = Text (enums/short messages)
B = Raw Text (notes/reports)
NLP = NLP result text
|tval_char|| Used in conjunction with valType_cd = “T” or “N”... |
When valtType_cd = “T”: Stores the text value
When valType_cd = “N”:
E = Equals
NE = Not equal
L = Less Than
LE = Less than and Equal to
G = Greater Than
GE = Greater than and Equal to
|nval_num||Used in conjunction with valType_cd = “N” to store a numerical value|
|valueFlag_cd|| Used in conjunction with valType_cd = “B”, “NLP”, “N”, or “T”... |
When valType_cd = “B” or “NLP” it is used to indicate whether or not the blob field is encrypted
( X = Encrypted text in blob field )
When valType_cd = “N” or “T” it is used to flag certain outlying or abnormal values
( H = High, L = Low, A = Abnormal )
|quantity_num||Quantity of nval|
|units_cd||Units of measurement of nval|
|end_date||The end date-time for the observation|
|location_cd||A location code, such as for a clinic|
|confidence_num||Assessment of accuracy of data|
|observation_blob||Holds any raw or miscellaneous data that exists, often encrypted PHI|