== i2b2 Ontology and CRC Discussion: 1 == Questions asked and points to ponder. Access to the demo database is useful here... === Note about missing values === Just a point to be aware of...[[br]] There seems to be a convention within i2b2 that missing values can be denoted by the @ sign '''''as a value'''''. That is, NULL is not the only way to show missing values. The @ sign can obviously thereby be used where columns are defined as NOT NULL, including primary key columns. I'm not certain whether the choice of @ sign is configurable or not, and where and when it cannot be used. === Ontology Cell tables compared to CRC concept_dimension table === There are three ontology tables within the demo Ontology cell: birn, i2b2 and custom_meta.[[br]] When I do a select count(*) from where c_visualattributes like '%LA%' I get the following figures: * birn: 37 * custom_meta: 0 * i2b2: 93807 When I do a "select count(*) from concept_dimension" within the CRC cell I get a count of 73590.[[br]] The totals are 93844 as opposed to 73590. Why is there a difference and what does it mean? === In an Ontology table, what does c_columndatatype really mean? === The docs state: "either ‘T’ for text or ‘N’ for numeric and describes the datatype of the concept". Yet the length of the column is varchar 50.[[br]] Within the demo project: * select count(*) from i2b2 where c_columndatatype like '%N%' ===> returns 0 * select count(*) from i2b2 where c_columndatatype like '%T%' ===> returns 134762 * select count(*) from i2b2 ===> returns 134762 Yet within the query tool (from the Workbench) I can select terms from a Labtest and constrain by value. This is undoubtedly possible because of the optional metadataxml column. But even so, what '''''does''''' c_columndatatype really mean?[[br]] By comparison with the observation_fact table in the CRC cell: * select count(*) from observation_fact where valtype_cd like '%T%' ===> returns 5055 * select count(*) from observation_fact where valtype_cd like '%N%' ===> returns 23731 ---- __Experiment conducted (Jeff)__: [[br]] Went into the database and chose to update a column in the Onyx metadata table for the Briccs project...[[BR]] update onyx set c_COLUMNDATATYPE = 'N' where c_name = 'history_AF_onset' ; [[br]] Went into the workbench and refreshed the ontology tree. Dragged the history_AF_onset leaf into a group. [[br]] It would not allow me to set a value ('Set value...' was greyed out / disabled). [[br]] Being neurotic, I then tried: [[br]] update onyx set c_operator = '=' where c_name = 'history_AF_onset' [[br]] Restarted the workbench just to be sure. No change! [[br]] If you want within the Query Tool to be able set a value for a term in a query, you obviously need to do more than this in the ontology tree. ---- === Facts constrained by enumerated concepts === Facts can be constrained by simply pointing at a concept row within the concept_dimension table with very little else alongside it. For example, a numeric value like AGE can be divided into an enumeration of concepts, say one for each age from 0 to 114, as is the demo system. Here is a selection based upon this: select * from observation_fact where concept_cd like 'DEM|AGE:50' ;[[br]] This retrieved 3 rows of patients aged 50 from the fact table: ||= ENCOUNTER_NUM =||= PATIENT_NUM =||= CONCEPT_CD =||= PROVIDER_ID =||= START_DATE =||= MODIFIER_CD =||= VALTYPE_CD =||= TVAL_CHAR =||= NVAL_NUM =||= VALUEFLAG_CD =||= QUANTITY_NUM =||= INSTANCE_NUM =||= UNITS_CD =||= END_DATE =||= LOCATION_CD =||= CONFIDENCE_NUM =||= OBSERVATION_BLOB =||= UPDATE_DATE =||= DOWNLOAD_DATE =||= IMPORT_DATE =||= SOURCESYSTEM_CD =||= UPLOAD_ID =|| || 2005000014 || 1000000014 || DEM|AGE:50 || @ || 22-MAY-06 || @ || @ || - || - || - || - || - || @ || 25-APR-07 || @ || - || - || 25-APR-07 || 25-APR-07 || 25-APR-07 || DEMOGRAPH|DEMO || - || || 2005000082 || 1000000082 || DEM|AGE:50 || @ || 25-MAY-06 || @ || @ || - || - || - || - || - || @ || 25-APR-07 || @ || - || - || 25-APR-07 || 25-APR-07 || 25-APR-07 || DEMOGRAPH|DEMO || - || || 2005000095 || 1000000095 || DEM|AGE:50 || @ || 05-MAY-07 || @ || @ || - || - || - || - || - || @ || 25-APR-07 || @ || - || - || 25-APR-07 || 25-APR-07 || 25-APR-07 || DEMOGRAPH|DEMO || - || I'm in two minds about this. It is a neat way of arranging data as long as values (whether text or numeric) have a reasonably discrete range. It seems this is the reason why lab test results etc '''''do not''''' follow this paradigm for their numeric values. === Constructing an Ontology Tree within the Workbench === Provided you have a root of an ontology tree that is not locked, it is possible to use the edit panel to construct a whole ontology tree within the Workbench! This is a good way to explore the possible settings within an ontology table, especially the c_metadataxml column. Unfortunately, the latter looks '''''definitely''''' aimed at lab results. I'll create some obviously named metadata roots so you can experiment in the demo project. === Family History and Personal History === Just in case you haven't drilled down the demo project's ontology tree very far, here are two interesting branches... * !Ontology/Diagnoses/V-codes/Family History * !Ontology/Diagnoses/V-codes/Personal History === A Principle Worth Stating === This is as much for my benefit as for anyone else...[[br]] **The __only__ metadata available to a user for forming a query is solely contained within the ontology tables, ie: that available by browsing an ontology tree.** [[br]] Although there is metadata stored within the observation_fact table itself (and we have to get this correct when loading the data), this particular metadata is only used when executing a query. It is '''''not''''' available at query forming time. === The C_METADATAXML Column === I cannot see any other way of providing metadata to allow values to be entered by a user at query forming time than by providing this optional metadata.[[br]] It bears re-reading the above a couple of times and taking in its implications. [[br]] For every concept with dynamic values (In the questionnaire think of dates or quantities as possible candidates): * Each concept (and maybe its ancestors in the tree) will need providing with an XML file, albeit held within a database column. * That file could be used on data loading to help format the relevant observation_fact. (I think this may be standard practice) * The presence of this metadata triggers a dialogue when query building which is clearly aimed at lab test values. This is hard coded within the workbench. On the whole however, I believe we would be wise to limit dynamic valued concepts to labtest results or similar... === Is there a method to this madness? === It looks as if the i2b2 approach is a series of design compromises made along the way. The following looks as if this is the way it has been argued...[[br]] '''First decision''': * Any fact that can present an infinite range of values can only be supplied by a lab test (or something similar). * Any non-labtest fact should be presented as an enumeration of values within the ontology tree. '''Second decision''': [[br]] Labtests are very varied, some providing a reading from an infinite range (say 123.567), some providing a discrete result from amongst an enumerated list. There are problems regarding units and standards. So we better hive this off completely within a dialogue separate from the ontology tree, and provide another way of customizing the choice (c_metadataxml). [[br]]