Context Navigation

Changes between Version 3 and Version 4 of i2b2 Onyx Importer

Timestamp:: 11/14/12 17:06:55 (13 years ago)
Author:: Nick Holden
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

i2b2 Onyx Importer

-              v3
+              v4
+'''i2b2 - importing data from Onyx'''
+= i2b2 - importing data from Onyx =
 Assuming incremental export processes by Onyx, the first import is going to be the most time consuming. You need to go through the complete process for the first one. The most time consuming aspect is loading the metadata. Once that is underway, I would start on the second and subsequent onyx export files whilst your waiting for the metadata upload to complete.
 …
 From the README in /usr/local/i2b2-procedures-1.0-SNAPSHOT-development:
+QUICK START.
+== QUICK START. ==
 ============
 Assuming you are already root ('sudo su -' if not)...
 …
   Review configuration settings within the config directory.
    Basically three files:
    config.properties
+   config.properties - DATABASE CONNECTION SETTINGS GO IN HERE
    defaults.sh
    log4j.properties
 …
+Notes from Jeff:
+=== Notes from Jeff: ===
 Note that there is a parameter in the Defaults.sh file:
 …
    participant-upload-sql.sh      (DON'T start this until you know the metadata-upload for the first export has completed successfully)
+Naming
+======
+=== Naming ===
 It's entirely up to you how you name the jobs for each of these: whatever is convenient.
+== Deleting and re-importing data ==
+When deleting and re-importing Onyx data (which includes the data for creating the patients in the i2b2 patient dimension), it is important to consider the data held against the other ontologies in i2b2. Either everything has to be deleted, which implies not having pathology, PATS or subsequent third-party data until the next time those routines load to i2b2, or the data has to be selectively deleted without damaging the data held against the other ontologies AND the patients need to be reloaded in the same order, to ensure that patients are allocated to the correct i2b2 identifiers.
+Database preparation for a complete reload of Onyx data.
+Assuming all metadata is unchanged.
+Need to be sure that the patients are mapped consistently in the re-loading process. General approach is to delete all the data (patient dimension, observation facts, visit dimension) but not touch ANY metadata.
+metadata database - NO CHANGES
+work database - NO CHANGES
+hive - NO CHANGES
+pm - NO CHANGES
+data DATABASE TABLES:
+ARCHIVE_OBSERVATION_FACT
+- Currently empty. Leave.
+CODE_LOOKUP
+- Currently empty. Leave.
+concept_dimension
+ - Pointers to ontology. Leave alone because ontology is not changing.
+DATAMART_REPORT
+- Currently empty. Leave.
+Encounter_Mapping
+- Currently empty. Leave.
+Observation_Fact
+- All observations included, from all ontologies.
+Selective delete.
+Need to delete where Concept_Cd LIKE 'CBO:%'
+rows deleted.
+ALSO THERE ARE FOUR ENTRIES WHERE Concept_Cd is empty. WHY?????? Delete them also.
+rows deleted.
+Patient_Dimension
+- All patients
+Need to delete all.
+rows deleted.
+Patient_Mapping
+- All patients
+Need to delete all.
+rows deleted.
+Provider_Dimension
+- Currently empty. Leave.
+QT_*
+- Are related to queries. Leave.
+SET_TYPE
+Index of 'set' types. Leave.
+SET_UPLOAD_STATUS
+- Currently empty. Leave.
+SOURCE_MASTER
+- Currently empty. Leave.
+UPLOAD_STATUS
+- Currently empty. Leave.
+Visit_Dimension
+Only Onyx data is loaded against 'visit dimension', one per patient. It is built by the import process. DELETE ALL.
+rows deleted.
+Looking to check which onyx export file(s) to use.
+procedures-1.0-SNAPSHOT-development-old alpha used test-BRICCS-20111021090857.zip, generated 8 patients.
+procedures-1.0-SNAPSHOT-development alpha used live-BRICCS-20111031180021.zip, generated 53 patients.
+procedures-1.0-SNAPSHOT-development beta used live-BRICCS-20111031184824.zip, generated 57 patients.
+procedures-development-trac92 used BRICCS-20110106095220.zip, generated 4 patients.
+procedures-trac108-SNAPSHOT_development ws-test used live-BRICCS-20111031184824.zip, generated 57 patients.
+Do two jobs: alpha and beta
+alpha: Use /home/nick/onyxexports/live-BRICCS-20111031180021.zip, generate 53 patients.
+beta: Use /home/nick/onyxexports/live-BRICCS-20111031184824.zip, generate 57 patients.
+This kind of worked, except that the TEST i2b2 on uhlbriccsapp02.xuhl-tr.nhs.uk had 110 patients loaded against pids from 2 to 111, and the above process re-loaded them against pids from 1 to 110. Oops. But either the other data (PATS and pathology) will be re-loaded against the new pids overnight, OR I can re-run this process tomorrow with pid=2 and eid=2 for the first batch.
+= NOTE: PROCESS WILL BE DIFFERENT IF THE ONTOLOGY ITSELF IS CHANGED IN ANY WAY. =