Version 21 (modified by 14 years ago) ( diff ) | ,
---|
i2b2 Ontology and CRC Discussion: 1
Questions asked and points to ponder. Access to the demo database is useful here...
Note about missing values
Just a point to be aware of...
There seems to be a convention within i2b2 that missing values can be denoted by the @ sign as a value. That is, NULL is not the only way to show missing values. The @ sign can obviously thereby be used where columns are defined as NOT NULL, including primary key columns. I'm not certain whether the choice of @ sign is configurable or not, and where and when it cannot be used.
Ontology Cell tables compared to CRC concept_dimension table
There are three ontology tables within the demo Ontology cell: birn, i2b2 and custom_meta.
When I do a select count(*) from <table_name> where c_visualattributes like '%LA%' I get the following figures:
- birn: 37
- custom_meta: 0
- i2b2: 93807
When I do a "select count(*) from concept_dimension" within the CRC cell I get a count of 73590.
The totals are 93844 as opposed to 73590. Why is there a difference and what does it mean?
In an Ontology table, what does c_columndatatype really mean?
The docs state: "either ‘T’ for text or ‘N’ for numeric and describes the datatype of the concept". Yet the length of the column is varchar 50.
Within the demo project:
- select count(*) from i2b2 where c_columndatatype like '%N%' ===> returns 0
- select count(*) from i2b2 where c_columndatatype like '%T%' ===> returns 134762
- select count(*) from i2b2 ===> returns 134762
Yet within the query tool (from the Workbench) I can select terms from a Labtest and constrain by value. This is undoubtedly possible because of the optional metadataxml column. But even so, what does c_columndatatype really mean?
By comparison with the observation_fact table in the CRC cell:
- select count(*) from observation_fact where valtype_cd like '%T%' ===> returns 5055
- select count(*) from observation_fact where valtype_cd like '%N%' ===> returns 23731
Experiment conducted (Jeff):
Went into the database and chose to update a column in the Onyx metadata table for the Briccs project...
update onyx set c_COLUMNDATATYPE = 'N' where c_name = 'history_AF_onset' ;
Went into the workbench and refreshed the ontology tree. Dragged the history_AF_onset leaf into a group.
It would not allow me to set a value ('Set value...' was greyed out / disabled).
Being neurotic, I then tried:
update onyx set c_operator = '=' where c_name = 'history_AF_onset'
Restarted the workbench just to be sure. No change!
If you want to set a value for a term in a query, you obviously need to do more than this.
Facts constrained by enumerated concepts
Facts can be constrained by simply pointing at a concept row within the concept_dimension table with very little else alongside it. For example, a numeric value like AGE can be divided into an enumeration of concepts, say one for each age from 0 to 114, as is the demo system. Here is a selection based upon this:
select * from observation_fact where concept_cd like 'DEM|AGE:50' ;
This retrieved 3 rows of patients aged 50 from the fact table:
ENCOUNTER_NUM | PATIENT_NUM | CONCEPT_CD | PROVIDER_ID | START_DATE | MODIFIER_CD | VALTYPE_CD | TVAL_CHAR | NVAL_NUM | VALUEFLAG_CD | QUANTITY_NUM | INSTANCE_NUM | UNITS_CD | END_DATE | LOCATION_CD | CONFIDENCE_NUM | OBSERVATION_BLOB | UPDATE_DATE | DOWNLOAD_DATE | IMPORT_DATE | SOURCESYSTEM_CD | UPLOAD_ID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2005000014 | 1000000014 | DEM|AGE:50 | @ | 22-MAY-06 | @ | @ | - | - | - | - | - | @ | 25-APR-07 | @ | - | - | 25-APR-07 | 25-APR-07 | 25-APR-07 | DEMOGRAPH|DEMO | - |
2005000082 | 1000000082 | DEM|AGE:50 | @ | 25-MAY-06 | @ | @ | - | - | - | - | - | @ | 25-APR-07 | @ | - | - | 25-APR-07 | 25-APR-07 | 25-APR-07 | DEMOGRAPH|DEMO | - |
2005000095 | 1000000095 | DEM|AGE:50 | @ | 05-MAY-07 | @ | @ | - | - | - | - | - | @ | 25-APR-07 | @ | - | - | 25-APR-07 | 25-APR-07 | 25-APR-07 | DEMOGRAPH|DEMO | - |
I'm in two minds about this. It is a neat way of arranging data as long as values (whether text or numeric) have a reasonably discrete range. It seems this is the reason why lab test results etc do not follow this paradigm for their numeric values.