Changes between Version 11 and Version 12 of OnyxExportOntology


Ignore:
Timestamp:
01/28/11 09:44:16 (13 years ago)
Author:
jeff.lusted
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • OnyxExportOntology

    v11 v12  
    326326'''Feedback and intermediate results from Jeff'''
    327327
     328Jason, I'd hold off doing a lot of work on the style sheet approach at this stage. I see style sheets coming into their own in the next stage, where we have to process the intermediate xml into SQL insert commands. But I could be wrong.
     329
     330The input is sourced from the export zip file which is attached to page [[Onyx Export and Purge]]. If you unzip this file you will find a directory structure for each "part" of the BRICCS questionnaire. Each directory has a variables.xml file plus a collection of data files for each participant exported from Onyx. It's the collection of variables.xml files which contain the metadata for each "part".
     331[[BR]]
     332We obviously need to explore this in conjunction with trying to ascertain what the final input into i2b2 will be like. I expect a few iterations before we get anywhere near what is required. At the moment, I see producing an intermediate form of ontology from Onyx as a first step. This intermediate stage (probably a number of xml files for each Onyx part) will be used to produce SQL for the Ontology Cell and the ontology dimension table within the CRC Cell (the data mart). I believe the intermediate ontology might then be used to drive an intermediate format for the participant data files (the 0000001.xml type files within the export zip). We can then use whatever comes out of that process to produce CSV files for import into i2b2 (fact table and the patient table). The latter is managed from within the i2b2 workbench.
     333{{{
     334ONYX Export File
     335 |
     336 +-->variables.xml---(program?)--->intermediate ontology +---(XSLT)--> SQL inserts into Ontology Cell tables
     337 |                                          |            |
     338 |                                          |            +---(XSLT)--> SQL inserts into CRC ontology_dimension table
     339 |                                          |
     340 |                                          V
     341 +-->nnnnnnnnn.xml---(program?)--->intermediate data---------(XSLT)--> CSV file for import into CRC fact and patient tables
     342}}}
     343
     344I'm agnostic as far as techniques are concerned (the bits in brackets). But I see XSLT as being admirably suited to doing the grunt work on producing SQL inserts and CSV files. The inserts into the Ontology cell for the demo data system were held in a file exceeding 250M in size, and I suspect even the first stab at Onyx will produce something relatively large. I've produced a first cut at processing a set of variables.xml files into an intermediate ontology and will attach a complete set corresponding to my example export zip file. I've struggled with aspects of trying to get a relatively systematized view from the collection of variables files. The idea being that one xsd file covers the whole structure, whatever variables file is chosen. There are some complex convolutions to derive variables and their grouping into different structures which I think is easier to explore programatically, at least for the moment.[[BR]]
     345
     346The project I've started is currently in SVN within my sandbox area: onyx-to-i2b2. I'm uncertain about the structure of the project, and how it should eventually look, which is why it is sandboxed for the time being. When we have a better idea, code, examples, xslt, everything should be moved into another area of SVN and mavenized. There might need to be another within the admin area which depicts scripts for exporting from onyx and readying for import into i2b2.
     347
     348=== Comments On Structures So Far ===
     349
     350