wiki:LEGACY - i2b2OntologyCRC-1

Version 36 (modified by jeff.lusted, 14 years ago) ( diff )

--

i2b2 Ontology and CRC Discussion: 1

Questions asked and points to ponder. Access to the demo database is useful here...

Note about missing values

Just a point to be aware of...
There seems to be a convention within i2b2 that missing values can be denoted by the @ sign as a value. That is, NULL is not the only way to show missing values. The @ sign can obviously thereby be used where columns are defined as NOT NULL, including primary key columns. I'm not certain whether the choice of @ sign is configurable or not, and where and when it cannot be used.

Ontology Cell tables compared to CRC concept_dimension table

There are three ontology tables within the demo Ontology cell: birn, i2b2 and custom_meta.

When I do a select count(*) from <table_name> where c_visualattributes like '%LA%' I get the following figures:

  • birn: 37
  • custom_meta: 0
  • i2b2: 93807

When I do a "select count(*) from concept_dimension" within the CRC cell I get a count of 73590.

The totals are 93844 as opposed to 73590. Why is there a difference and what does it mean?

In an Ontology table, what does c_columndatatype really mean?

The docs state: "either ‘T’ for text or ‘N’ for numeric and describes the datatype of the concept". Yet the length of the column is varchar 50.

Within the demo project:

  • select count(*) from i2b2 where c_columndatatype like '%N%' ===> returns 0
  • select count(*) from i2b2 where c_columndatatype like '%T%' ===> returns 134762
  • select count(*) from i2b2 ===> returns 134762

Yet within the query tool (from the Workbench) I can select terms from a Labtest and constrain by value. This is undoubtedly possible because of the optional metadataxml column. But even so, what does c_columndatatype really mean?

By comparison with the observation_fact table in the CRC cell:

  • select count(*) from observation_fact where valtype_cd like '%T%' ===> returns 5055
  • select count(*) from observation_fact where valtype_cd like '%N%' ===> returns 23731

Experiment conducted (Jeff):
Went into the database and chose to update a column in the Onyx metadata table for the Briccs project...
update onyx set c_COLUMNDATATYPE = 'N' where c_name = 'history_AF_onset' ;
Went into the workbench and refreshed the ontology tree. Dragged the history_AF_onset leaf into a group.
It would not allow me to set a value ('Set value...' was greyed out / disabled).
Being neurotic, I then tried:
update onyx set c_operator = '=' where c_name = 'history_AF_onset'
Restarted the workbench just to be sure. No change!

If you want within the Query Tool to be able set a value for a term in a query, you obviously need to do more than this in the ontology tree.


Facts constrained by enumerated concepts

Facts can be constrained by simply pointing at a concept row within the concept_dimension table with very little else alongside it. For example, a numeric value like AGE can be divided into an enumeration of concepts, say one for each age from 0 to 114, as is the demo system. Here is a selection based upon this:

select * from observation_fact where concept_cd like 'DEM|AGE:50' ;

This retrieved 3 rows of patients aged 50 from the fact table:

ENCOUNTER_NUM PATIENT_NUM CONCEPT_CD PROVIDER_ID START_DATE MODIFIER_CD VALTYPE_CD TVAL_CHAR NVAL_NUM VALUEFLAG_CD QUANTITY_NUM INSTANCE_NUM UNITS_CD END_DATE LOCATION_CD CONFIDENCE_NUM OBSERVATION_BLOB UPDATE_DATE DOWNLOAD_DATE IMPORT_DATE SOURCESYSTEM_CD UPLOAD_ID
2005000014 1000000014 DEM|AGE:50 @ 22-MAY-06 @ @ - - - - - @ 25-APR-07 @ - - 25-APR-07 25-APR-07 25-APR-07 DEMOGRAPH|DEMO -
2005000082 1000000082 DEM|AGE:50 @ 25-MAY-06 @ @ - - - - - @ 25-APR-07 @ - - 25-APR-07 25-APR-07 25-APR-07 DEMOGRAPH|DEMO -
2005000095 1000000095 DEM|AGE:50 @ 05-MAY-07 @ @ - - - - - @ 25-APR-07 @ - - 25-APR-07 25-APR-07 25-APR-07 DEMOGRAPH|DEMO -

I'm in two minds about this. It is a neat way of arranging data as long as values (whether text or numeric) have a reasonably discrete range. It seems this is the reason why lab test results etc do not follow this paradigm for their numeric values.

Constructing an Ontology Tree within the Workbench

Provided you have a root of an ontology tree that is not locked, it is possible to use the edit panel to construct a whole ontology tree within the Workbench! This is a good way to explore the possible settings within an ontology table, especially the c_metadataxml column. Unfortunately, the latter looks definitely aimed at lab results. I'll create some obviously named metadata roots so you can experiment in the demo project.

Family History and Personal History

Just in case you haven't drilled down the demo project's ontology tree very far, here are two interesting branches...

  • Ontology/Diagnoses/V-codes/Family History
  • Ontology/Diagnoses/V-codes/Personal History

A Principle Worth Stating

This is as much for my benefit as for anyone else...

The only metadata available to a user for forming a query is solely contained within the ontology tables, ie: that available by browsing an ontology tree.

Although there is metadata stored within the observation_fact table itself (and we have to get this correct when loading the data), this particular metadata is only used when executing a query. It is not available at query forming time.

The C_METADATAXML Column

I cannot see any other way of providing metadata to allow values to be entered by a user at query forming time than by providing this optional metadata.

It bears re-reading the above a couple of times and taking in its implications.

For every concept with dynamic values (In the questionnaire think of dates or quantities as possible candidates):

  • Each concept (and maybe its ancestors in the tree) will need providing with an XML file, albeit held within a database column.
  • That file could be used on data loading to help format the relevant observation_fact. (I think this may be standard practice)
  • The presence of this metadata triggers a dialogue when query building which is clearly aimed at lab test values. This is hard coded within the workbench.

On the whole however, I believe we would be wise to limit dynamic valued concepts to labtest results or similar...

Is there a method to this madness?

I think there is. <<<To be continued>>>

Note: See TracWiki for help on using the wiki.