| 2 | This is about deriving a first ontology from Onyx for trialing the import of an ontology and subsequent data into i2b2. |
| 3 | |
| 4 | ---- |
| 5 | '''Nick's comments from meeting with Jason, Jeff and Dave''' |
| 6 | {{{ |
| 7 | Working from variables.xml in MedicalHistoryInterviewQuestionnaire |
| 8 | |
| 9 | Top level of xml is the questionnaire - nothing pulled from export xml to populate this. |
| 10 | |
| 11 | Second level is taken from the stage name - all variables have a stage name attribute in the export xml. |
| 12 | |
| 13 | Third level is section name - all variables have a section name attribute. |
| 14 | |
| 15 | Fourth level is questionName - all variables have a questionName attribute. |
| 16 | |
| 17 | Fifth level is name of variable - defined in the <variable name=""> section of the export xml (or possibly constructed from variable name and category name, see below). Note that in some cases this will be the same as the fourth level. If this isn't a good idea, then the name will need to be extended by means of an additional string (maybe '.Question'?) |
| 18 | |
| 19 | <label> is derived from the variable's attribute "label" |
| 20 | |
| 21 | <type> is derived from the <variable valueType=""> declaration. |
| 22 | |
| 23 | * * * |
| 24 | |
| 25 | I've done the high BP questions (Do you.., When did you..., Have you received...) and then skipped all the other conditions as the question structure is essentially the same, albeit some with multiple categories for (e.g.) type of diabetes, treatment of diabetes, or with multiple question sets for multiple event conditions (MI, etc). |
| 26 | |
| 27 | * * * |
| 28 | |
| 29 | Discussion |
| 30 | |
| 31 | This is messy. Pulling the data only from each <variable></variable> element definition makes sense, but the 'category' variables (Y,N,PNA,DK,etc) would then have 'labels' of 'Y', 'N' and so on - put hundreds of those into i2b2 and you've got a very confusing ontology. |
| 32 | |
| 33 | Jeff's idea of using the <category> child elements of the original <variables> might therefore be a better idea, but it requires a more complex filter - where a variable has category child elements then the filter needs to construct additional ontology entries from those child elements (combining the variable label and the category label) and ignore the following variables (with the same root variable name) or possibly add detail from them, but where the variable doesn't have category child elements then it must behave differently. |
| 34 | |
| 35 | Thinking further, it would be better if the primary question <label> element related to the fourth level, not the fifth. Thus in the hierarchy there would be the primary question label, with 1 or more variables underneath it. |
| 36 | |
| 37 | NOTE a GOTCHA with questions that generate an integer value: the boolean variable is named e.g. part_hist_highbp_onset_cat but the integer value associated with it is just part_hist_highbp_onset. |
| 38 | |
| 39 | Definitively do not need: page attribute, required attribute, condition attribute, validation attribute. I don't think we need to bring over either the category 'code' attribute, or the 'missing' attribute, but maybe we do? Also don't think we need exclusiveChoiceCategoryVariable attribute, as the ontology just needs to provide all possible variables for all possible participants. |
| 40 | |
| 41 | Does the stage attribute of a variable ALWAYS match it's questionnaire attribute? Looks like it does. |
| 42 | |
| 43 | What is the 'script' attribute for? |
| 44 | |
| 45 | Does the ontology need to include the category variable 'code' attributes? Depends on the structure of the participant answer files. Looks like the codes aren't even mentioned in the answer files, so ignore them for now. |
| 46 | }}} |