i2b2 Ontology and CRC Discussion: 1
Questions asked and points to ponder. This is a miscellany of points. Access to the demo database is useful here...
Note about missing values
Just a point to be aware of...
There seems to be a convention within i2b2 that missing values can be denoted by the @ sign as a value. That is, NULL is not the only way to show missing values. The @ sign can obviously thereby be used where columns are defined as NOT NULL, including primary key columns. I'm not certain whether the choice of @ sign is configurable or not, and where and when it cannot be used.
Ontology Cell tables compared to CRC concept_dimension table
There are three ontology tables within the demo Ontology cell: birn, i2b2 and custom_meta.
When I do a select count(*) from <table_name> where c_visualattributes like '%LA%' I get the following figures:
- birn: 37
- custom_meta: 0
- i2b2: 93807
When I do a "select count(*) from concept_dimension" within the CRC cell I get a count of 73590.
The totals are 93844 as opposed to 73590. Why is there a difference and what does it mean?
In an Ontology table, what does c_columndatatype really mean?
The docs state: "either ‘T’ for text or ‘N’ for numeric and describes the datatype of the concept". Yet the length of the column is varchar 50.
Within the demo project:
- select count(*) from i2b2 where c_columndatatype like '%N%' ===> returns 0
- select count(*) from i2b2 where c_columndatatype like '%T%' ===> returns 134762
- select count(*) from i2b2 ===> returns 134762
Yet within the query tool (from the Workbench) I can select terms from a Labtest and constrain by value. This is undoubtedly possible because of the optional metadataxml column. But even so, what does c_columndatatype really mean?
By comparison with the observation_fact table in the CRC cell:
- select count(*) from observation_fact where valtype_cd like '%T%' ===> returns 5055
- select count(*) from observation_fact where valtype_cd like '%N%' ===> returns 23731
Experiment conducted (Jeff):
Went into the database and chose to update a column in the Onyx metadata table for the Briccs project...
update onyx set c_COLUMNDATATYPE = 'N' where c_name = 'history_AF_onset' ;
Went into the workbench and refreshed the ontology tree. Dragged the history_AF_onset leaf into a group.
It would not allow me to set a value ('Set value...' was greyed out / disabled).
Being neurotic, I then tried:
update onyx set c_operator = '=' where c_name = 'history_AF_onset'
Restarted the workbench just to be sure. No change!
If you want within the Query Tool to be able set a value for a term in a query, you obviously need to do more than this in the ontology tree.
Facts constrained by enumerated concepts
Facts can be constrained by simply pointing at a concept row within the concept_dimension table with very little else alongside it. For example, a numeric value like AGE can be divided into an enumeration of concepts, say one for each age from 0 to 114, as is the demo system. Here is a selection based upon this:
select * from observation_fact where concept_cd like 'DEM|AGE:50' ;
This retrieved 3 rows of patients aged 50 from the fact table:
I'm in two minds about this. It is a neat way of arranging data as long as values (whether text or numeric) have a reasonably discrete range. It seems this is the reason why lab test results etc do not follow this paradigm for their numeric values.
Constructing an Ontology Tree within the Workbench
Provided you have a root of an ontology tree that is not locked, it is possible to use the edit panel to construct a whole ontology tree within the Workbench! This is a good way to explore the possible settings within an ontology table, especially the c_metadataxml column. Unfortunately, the latter looks definitely aimed at lab results. I'll create some obviously named metadata roots so you can experiment in the demo project.
Family History and Personal History
Just in case you haven't drilled down the demo project's ontology tree very far, here are two interesting branches...
- Ontology/Diagnoses/V-codes/Family History
- Ontology/Diagnoses/V-codes/Personal History
A Principle Worth Stating
This is as much for my benefit as for anyone else...
The only metadata available to a user for forming a query is solely contained within the ontology tables, ie: that available by browsing an ontology tree.
Although there is metadata stored within the observation_fact table itself (and we have to get this correct when loading the data), this particular metadata is only used when executing a query. It is not available at query forming time.
The C_METADATAXML Column
I cannot see any other way of providing metadata to allow values to be entered by a user at query forming time than by providing this optional metadata.
It bears re-reading the above a couple of times and taking in its implications.
For every concept with dynamic values (In the questionnaire think of dates or quantities as possible candidates):
- Each concept (and maybe its ancestors in the tree) will need providing with an XML file, albeit held within a database column.
- That file could be used on data loading to help format the relevant observation_fact. (I think this may be standard practice)
- The presence of this metadata triggers a dialogue when query building which is clearly aimed at lab test values. This is hard coded within the workbench.
On the whole however, I believe we would be wise to limit dynamic valued concepts to labtest results or similar...
Is there a method to this madness?
It looks as if the i2b2 approach is a series of design compromises made along the way. The following looks as if this is the way it has been argued...
- Any fact that can present an infinite range of values can only be supplied by a lab test (or something similar).
- Any non-labtest fact should be presented as an enumeration of values within the ontology tree.
Labtests are very varied, some providing a reading from an infinite range (say 123.567), some providing a discrete result from amongst an enumerated list. There are problems regarding units and standards. So we better hive this off completely within a dialogue separate from the ontology tree, and provide another way of customizing the choice (c_metadataxml).
Dates in the observation_fact Table
From the design doc: "Each row describes one observation about a patient made during a visit."
There is a START_DATE and an END_DATE for each observation. The start date is mandatory. The end date is not mandatory. The end date can be the same as the start date.
It is possible to constrain queries within the query tool using a date or date range for a group: there is a data button at the top left hand side of each group panel.
When you encounter a date in the questionnaire, I have in mind that this is where the date should go....
Interventions during this clinical episode. Enter the date at which each intervention was performed for the first time during the current episode of care. CABG Coronary Artery Bypass Graft Valve Surgery TAVI - Transcatheter Aortic Valve Implantation PPCI - Primary Percutaneous Coronary Intervention Other PCI Pacemaker insertion ICD - Implantable Cardioverter Defibrillator LVAD - Left Ventricular Assist Device Thrombolysis Electrophysiology (EP) / Radiofrequency (RF) Ablation Coronary Angiography
MA and LH settings for c_visualattributes in an ontology table
I was having difficulty making sense of some entries in an ontology table within the Ontology cell and their relationship with the patient_dimension table within the CRC cell. And discovered this.
Take the religion "Jewish". In the ontology display this looks like a bottom leaf. It isn't. The following three rows from the ontology table i2b2 (some columns excluded) are relevant:
The 'LH' setting flags that there are hidden leaves. The 'MA' setting flags that this is not a bottom leaf and has hidden leaves below it. The upshot is that there are two codes relevant to being "Jewish": "DEM|RELIGION:jewish" and "DEM|RELIGION:jh". What you see in the Ontology tree is the 'MA' entry.
The code_lookup, patient_dimension and observation_fact tables within the CRC cell
This is partly related to the previous point, so also concerns ontology entries! And I'm finding this quite confusing.
There are 16 people in the demo system who are Jewish. I won't display all rows...
The query "select * from observation_fact where concept_cd like 'DEM|RELIGION:j%'" returns 16 people (some columns and rows excluded):
The query "select * from patient_dimension where religion_cd like '%j%'" returns 16 people (some columns and rows excluded):
The query "select * from code_lookup where code_cd like 'DEM|RELIGION:j%'" returns 6 rows, only two of which apply to being Jewish. The other 4 concern Jehova's witnesses (some columns and rows excluded):
I think this makes sense, but please differ if you find a better explanation.
First of all it's interesting to note that patient information can/is duplicated across patient_dimension and observation_fact tables. My reading of the situation is that the entry 'DEM|RELIGION:jewish' in the fact table is identical with the 'jewish' entry in the patient_dimension table. The patient_dimension table has the value in column named RELIGION_CD which in effect replaces the name-space 'DEM|RELIGION:' that prefixes the value in the observation_fact table. So I would argue these are identical. The code_lookup table in column CODE_CD is using the full name-space qualified code. For human readability (for reports etc) the value to use is in the NAME_CHAR column.
It appears the code_lookup table is just what it says. It allows data to be stored in coded form but displayed (when required) in human readable form.
The columns within the patient_dimension table are optional columns. Beware of this omission of a name space within optional columns. I'm not completely sure of all its ramifications.