Changes between Initial Version and Version 1 of i2b2 Onyx Importer


Ignore:
Timestamp:
04/10/12 11:33:43 (13 years ago)
Author:
Nick Holden
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • i2b2 Onyx Importer

    v1 v1  
     1'''i2b2 - importing data from Onyx'''
     2
     3Assuming incremental export processes by Onyx, the first import is going to be the most time consuming. You need to go through the complete process for the first one. The most time consuming aspect is loading the metadata. Once that is underway, I would start on the second and subsequent onyx export files whilst your waiting for the metadata upload to complete.
     4
     5
     6From the README in /usr/local/i2b2-procedures-1.0-SNAPSHOT-development
     7
     8QUICK START.
     9============
     101. Unzip this package into a convenient place on a server hosting an i2b2 domain.
     112. Set the I2B2_PROCEDURES_HOME environment variable and export it:
     12   # export I2B2_PROCEDURES_HOME=/usr/local/i2b2-procedures-1.0-SNAPSHOT-development
     133. Ensure the following environment variables are also set and exported:
     14   (but these can be set within one of the config files, e.g. config/defaults.sh)
     15     JAVA_HOME
     16     ANT_HOME
     17     JBOSS_HOME
     184  If you wish to run procedures from any current directory,
     19   run the build-symlinks script in the bin/utility directory
     20   and add the bin/symlinks to your path.
     21   But go steady; this could wait until later.
     225  Review configuration settings within the config directory.
     23   Basically three files:
     24   config.properties
     25   defaults.sh
     26   log4j.properties
     276. The order of completion (by directories) of procedures:
     28   i)   data-prep          (regular)
     29   ii)  project-install    (once)
     30        NB: The job step update-datasources.sh tries to recover if it fails.
     31        However, it is good practice to check the JBoss data source files
     32        for correctness before rerunning this step.
     33   iii) meta-upload        (once and then whenever required)
     34   iv)  participant-upload (regular)
     35
     36
     37
     38Notes from Jeff:
     39
     40Note that there is a parameter in the Defaults.sh file:
     41# Max number of participants to be folded into one PDO xml file:
     42BATCH_SIZE=50
     43
     44If this number is exceeded in any export, no matter, it will simply create more than one PDO file. Or you can bump up the batch figure to ensure just one file, but this increases the memory usage. The PDO has a particular naming convention as illustrated below:
     45
     46onyx-4-20111101-111556704-TEST-DATA-ONLY-pdo.xml
     47
     48Note the 4 after "onyx-".  It indicates how many participants are included in this file. It might help when you need to indicate pid and eid for the next export!
     49
     50"TEST-DATA-ONLY" is there only when executing A-onyx2pdo-testdata.sh as opposed to A-onyx2pdo.sh. The rest is date/time.
     51
     52A-onyx2pdo-testdata.sh mangles dates and does no mapping for the s-number.
     53
     54First export:
     55==========
     56data-prep:
     57   1-namespace-update.sh
     58   2-clean-onyx-variables.sh
     59   3-onyx2metadata.sh
     60   5-refine-metadata.sh
     61   6-xslt-refined-2ontcell.sh
     62   7-xslt-refined-2ontdim.sh
     63   8-xslt-refined-enum2ontcell.sh
     64   9-xslt-refined-enum2ontdim.sh
     65   A-onyx2pdo.sh    or   A-onyx2pdo-testdata.sh  (make sure you record the pid and eid ranges)
     66   B-xslt-pdo2crc.sh
     67
     68project-install:
     69   1-project-install.sh
     70   2-update-datasources.sh
     71
     72metadata-upload:
     73   metadata-upload-sql.sh         (Once this is underway, begin on the second onyx export file)
     74
     75participant-upoad:
     76   participant-upload-sql.sh       (Good idea to make sure this is the first one triggered if working in parallel)
     77
     78Second and Subsequent Export Files:
     79==============================
     80data-prep:
     81   1-namespace-update.sh
     82   2-clean-onyx-variables.sh
     83   3-onyx2metadata.sh
     84   5-refine-metadata.sh
     85   A-onyx2pdo.sh    or   A-onyx2pdo-testdata.sh    (make sure you record the pid and eid ranges)
     86   B-xslt-pdo2crc.sh
     87
     88participant-upoad:
     89   participant-upload-sql.sh      (DON'T start this until you know the metadata-upload for the first export has completed successfully)
     90
     91Naming
     92======
     93It's entirely up to you how you name the jobs for each of these: whatever is convenient.
     94