== i2b2 Ontology and CRC Discussion: 1 ==
Questions asked and points to ponder. Access to the demo database is useful here...

=== Ontology Cell tables compared to CRC concept_dimension table ===
There are three ontology tables within the demo Ontology cell: birn, i2b2 and custom_meta.[[br]] 

When I do a select count(*) from <table_name> where c_visualattributes like '%LA%' I get the following figures:
   * birn: 37
   * custom_meta: 0
   * i2b2: 93807

When I do a "select count(*) from concept_dimension" within the CRC cell I get a count of 73590.[[br]]

The totals are 93844 as opposed to 73590. Why is there a difference and what does it mean?

=== In an Ontology table, what does c_columndatatype really mean? ===
The docs state: "either ‘T’ for text or ‘N’ for numeric and describes the datatype of the concept". Yet the length of the column is varchar 50.[[br]]

Within the demo project: 
   * select count(*) from i2b2 where c_columndatatype like '%N%' ===> returns 0
   * select count(*) from i2b2 where c_columndatatype like '%T%' ===> returns 134762
   * select count(*) from i2b2 ===> returns 134762

Yet within the query tool (from the Workbench) I can select terms from a Labtest and constrain by value. This is undoubtedly possible because of the optional metadataxml column. But even so, what '''''does''''' c_columndatatype really mean?[[br]]

=== Facts constrained by enumerated concepts ===
Facts can be constrained by simply pointing at a concept row within the concept_dimension table with very little else alongside it. For example, a numeric value like AGE can be divided into an enumeration of concepts, say one for each age from 0 to 114, as is the demo system. Here is a selection based upon this:
 select * from observation_fact where concept_cd like 'DEM|AGE:50' ;[[br]]

This retrieved 3 rows of patients aged 50 from the fact table: 
||= ENCOUNTER_NUM =||= PATIENT_NUM =||= CONCEPT_CD =||= PROVIDER_ID =||= START_DATE =||= MODIFIER_CD =||= VALTYPE_CD =||= TVAL_CHAR =||= NVAL_NUM =||= VALUEFLAG_CD =||= QUANTITY_NUM =||= INSTANCE_NUM =||= UNITS_CD =||= END_DATE =||= LOCATION_CD =||= CONFIDENCE_NUM =||= OBSERVATION_BLOB =||= UPDATE_DATE =||= DOWNLOAD_DATE =||= IMPORT_DATE =||= SOURCESYSTEM_CD =||= UPLOAD_ID =||
|| 2005000014 ||  1000000014 || DEM|AGE:50 || @ || 22-MAY-06 || @ || @ || -  || -  || -  || -  || -  || @ || 25-APR-07 || @ || -  || -  || 25-APR-07 || 25-APR-07 || 25-APR-07 || DEMOGRAPH|DEMO || - ||
|| 2005000082 || 1000000082 || DEM|AGE:50 || @ || 25-MAY-06 || @ || @ || -  || -  || -  || -  || -  || @ || 25-APR-07 || @ || -  || -  || 25-APR-07 || 25-APR-07 || 25-APR-07 || DEMOGRAPH|DEMO || - ||
|| 2005000095 || 1000000095 || DEM|AGE:50 || @ || 05-MAY-07 || @ || @ || -  || -  || -  || -  || -  || @ || 25-APR-07 || @ || -  || -  || 25-APR-07 || 25-APR-07 || 25-APR-07 || DEMOGRAPH|DEMO || - ||

I'm in two minds about this. It is a neat way of arranging data as long as values (text or numeric) have a reasonable discrete range. It seems this is the reason why labtests etc '''''do not''''' follow this paradigm for their numeric values.