wiki:OnyxExportOntology

Version 7 (modified by jeff.lusted, 13 years ago) ( diff )

--

Deriving an Initial Ontology from an Onyx Export

This is about deriving a first ontology from Onyx for trialing the import of an ontology and subsequent data into i2b2.


Nick's comments from meeting with Jason, Jeff and Dave

Working from variables.xml in MedicalHistoryInterviewQuestionnaire

Top level of xml is the questionnaire - nothing pulled from export xml to populate this.

Second level is taken from the stage name - all variables have a stage name attribute in the export xml.

Third level is section name - all variables have a section name attribute.

Fourth level is questionName - all variables have a questionName attribute.

Fifth level is name of variable - defined in the <variable name=""> section of the export xml (or possibly constructed from variable name and category name, see below). Note that in some cases this will be the same as the fourth level. If this isn't a good idea, then the name will need to be extended by means of an additional string (maybe '.Question'?)

<label> is derived from the variable's attribute "label"

<type> is derived from the <variable valueType=""> declaration.

* * * 

I've done the high BP questions (Do you.., When did you..., Have you received...) and then skipped all the other conditions as the question structure is essentially the same, albeit some with multiple categories for (e.g.) type of diabetes, treatment of diabetes, or with multiple question sets for multiple event conditions (MI, etc).

* * *

Discussion

This is messy. Pulling the data only from each <variable></variable> element definition makes sense, but the 'category' variables (Y,N,PNA,DK,etc) would then have 'labels' of 'Y', 'N' and so on - put hundreds of those into i2b2 and you've got a very confusing ontology.

Jeff's idea of using the <category> child elements of the original <variables> might therefore be a better idea, but it requires a more complex filter - where a variable has category child elements then the filter needs to construct additional ontology entries from those child elements (combining the variable label and the category label) and ignore the following variables (with the same root variable name) or possibly add detail from them, but where the variable doesn't have category child elements then it must behave differently.

Thinking further, it would be better if the primary question <label> element related to the fourth level, not the fifth. Thus in the hierarchy there would be the primary question label, with 1 or more variables underneath it.

NOTE a GOTCHA with questions that generate an integer value: the boolean variable is named e.g. part_hist_highbp_onset_cat but the integer value associated with it is just part_hist_highbp_onset.

Definitively do not need: page attribute, required attribute, condition attribute, validation attribute. I don't think we need to bring over either the category 'code' attribute, or the 'missing' attribute, but maybe we do? Also don't think we need exclusiveChoiceCategoryVariable attribute, as the ontology just needs to provide all possible variables for all possible participants.

Does the stage attribute of a variable ALWAYS match it's questionnaire attribute? Looks like it does.

What is the 'script' attribute for?

Does the ontology need to include the category variable 'code' attributes? Depends on the structure of the participant answer files. Looks like the codes aren't even mentioned in the answer files, so ignore them for now.

This is Nick's original example:

<briccs_questionnaire>
	<MedicalHistoryInterviewQuestionnaire>
		<MAIN>
			<part_hist_highbp>
				<label>Have you ever suffered from high blood pressure?</label>

				<part_hist_highbp>
					<type>text</type>
					<label></label>
				</part_hist_highbp>

				<part_hist_highbp.N>
					<label>No</label>
					<type>boolean</type>
				</part_hist_highbp.N>

				<part_hist_highbp.Y>
					<label>Yes</label>
					<type>boolean</type>
				</part_hist_highbp.Y>

				<part_hist_highbp.PNA>
					<label>Prefer not to answer</label>
					<type>boolean</type>
				</part_hist_highbp.PNA>

				<part_hist_highbp.DK>
					<label>Don't know</label>
					<type>boolean</type>
				</part_hist_highbp.DK>

				<part_hist_highbp.comment>
					<label>Comment</label>
					<type>text</type>
				</part_hist_highbp.comment>

			</part_hist_highbp>

			<part_hist_highbp_onset_cat>
				<label>When did you first suffer from high blood pressure?</label>

				<part_hist_highbp_onset_cat>
					<type>text</type>
					<label></label>
				</part_hist_highbp_onset_cat>

				<part_hist_highbp_onset_cat.YEAR>
					<label>Year</label>
					<type>boolean</type>
				</part_hist_highbp_onset_cat.YEAR>

				<part_hist_highbp_onset>
					<label></label>
					<type>integer</type>
				</part_hist_highbp_onset_cat.YEAR>

					<part_hist_highbp_onset_cat.PNA>
						<label>Prefer not to answer</label>
						<type>boolean</type>
					</part_hist_highbp_onset_cat.PNA>

					<part_hist_highbp_onset_cat.DK>
						<label>Don't know</label>
						<type>boolean</type>
					</part_hist_highbp_onset_cat.DK>

					<part_hist_highbp_onset_cat.comment>
						<label>Comment</label>
						<type>text</type>
					</part_hist_highbp_onset_cat.comment>

			</part_hist_highbp_onset_cat>

			<part_hist_highbp_treat>
				<label>Have you received treatment for your high blood pressure?
				</label>

				<part_hist_highbp_treat>
					<type>text</type>
					<label></label>
				</part_hist_highbp_treat>

				<part_hist_highbp_treat.N>
					<label>No</label>
					<type>boolean</type>
				</part_hist_highbp_treat.N>

				<part_hist_highbp_treat.Y>
					<label>Yes</label>
					<type>boolean</type>
				</part_hist_highbp_treat.Y>

				<part_hist_highbp_treat.PNA>
					<label>Prefer not to answer</label>
					<type>boolean</type>
				</part_hist_highbp_treat.PNA>

				<part_hist_highbp_treat.DK>
					<label>Don't know</label>
					<type>boolean</type>
				</part_hist_highbp_treat.DK>

				<part_hist_highbp_treat.comment>
					<label>Comment</label>
					<type>text</type>
				</part_hist_highbp_treat.comment>

			</part_hist_highbp_treat>

		</MAIN>

	</MedicalHistoryInterviewQuestionnaire>
</briccs_questionnaire>

Jeff's Revision of the above into a no-attributes style xml:

<?xml version="1.0" encoding="UTF-8"?>
<source xmlns="http://briccs.org.uk/xml/v1.0/oi" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
    <name>briccs</name>
    
    <stage>
    	<name>MedicalHistoryInterviewQuestionnaire</name>
    	
    	<section>
    		<name>MAIN</name>
    		
			<question>
				<name>part_hist_highbp</name>			
				<label>Have you ever suffered from high blood pressure?</label>
				<variable>
					<name>part_hist_highbp</name>
					<label></label>
					<type>text</type>
					<variable>
						<name>N</name>
						<label>No</label>
						<type>boolean</type>
					</variable>
					<variable>
						<name>Y</name>
						<label>Yes</label>
						<type>boolean</type>
					</variable>
					<variable>
						<name>PNA</name>
						<label>Prefer not to answer</label>
						<type>boolean</type>
					</variable>
					<variable>
						<name>DK</name>
						<label>Don't know</label>
						<type>boolean</type>
					</variable>
					<variable>
						<name>comment</name>
						<label>Comment</label>
						<type>text</type>
					</variable>
				</variable>
			</question>

			<question>
				<name>part_hist_highbp_onset_cat</name>
				<label>When did you first suffer from high blood pressure?</label>
				<variable>
					<name>part_hist_highbp_onset_cat</name>
					<label></label>
					<type>text</type>
					<variable>
						<name>YEAR</name>
						<label>Year</label>
						<type>boolean</type>
					</variable>
					<variable>
						<name>PNA</name>
						<label>Prefer not to answer</label>
						<type>boolean</type>
					</variable>
					<variable>
						<name>DK</name>
						<label>Don't know</label>
						<type>boolean</type>
					</variable>
					<variable>
						<name>comment</name>
						<label>Comment</label>
						<type>text</type>
					</variable>					
				</variable>
				<variable>
					<name>part_hist_highbp_onset</name>
					<label></label>
					<type>integer</type>
				</variable>
			</question>
			
		</section>
		
	</stage>
	
</source>

Jeff's comments after initial work on the above ideas:

Learned some obvious lessons that require revision of the approach...
Basically not all variables are the result of a question. And it comes in various guises...

  1. Even in a Questionnaire stage, there are some variables not attached to any question, eg: QuestionnaireRun.version and other sorts of metric-like data.
  2. Some stages are not questionnaires (and therefore contain no questions), eg: the sample stages.
  3. Some are not even stages: see the Participant file within the export. The Participant file contains participant data retrieved via the PMI plus "other" admin type data, eg: Admin.Participant.birthDate and Admin.Action.user.

Attachments (5)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.