Changes between Version 3 and Version 4 of i2b2 Onyx Importer


Ignore:
Timestamp:
11/14/12 17:06:55 (12 years ago)
Author:
Nick Holden
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • i2b2 Onyx Importer

    v3 v4  
    1 '''i2b2 - importing data from Onyx'''
     1= i2b2 - importing data from Onyx =
    22
    33Assuming incremental export processes by Onyx, the first import is going to be the most time consuming. You need to go through the complete process for the first one. The most time consuming aspect is loading the metadata. Once that is underway, I would start on the second and subsequent onyx export files whilst your waiting for the metadata upload to complete.
     
    77From the README in /usr/local/i2b2-procedures-1.0-SNAPSHOT-development:
    88
    9 QUICK START.
     9== QUICK START. ==
    1010============
     11
    1112Assuming you are already root ('sudo su -' if not)...
    1213
     
    25265  Review configuration settings within the config directory.
    2627   Basically three files:
    27    config.properties
     28   config.properties - DATABASE CONNECTION SETTINGS GO IN HERE
    2829   defaults.sh
    2930   log4j.properties
     
    3940
    4041
    41 Notes from Jeff:
     42=== Notes from Jeff: ===
    4243
    4344Note that there is a parameter in the Defaults.sh file:
     
    9293   participant-upload-sql.sh      (DON'T start this until you know the metadata-upload for the first export has completed successfully)
    9394
    94 Naming
    95 ======
     95=== Naming ===
     96
    9697It's entirely up to you how you name the jobs for each of these: whatever is convenient.
    9798
     99== Deleting and re-importing data ==
     100
     101When deleting and re-importing Onyx data (which includes the data for creating the patients in the i2b2 patient dimension), it is important to consider the data held against the other ontologies in i2b2. Either everything has to be deleted, which implies not having pathology, PATS or subsequent third-party data until the next time those routines load to i2b2, or the data has to be selectively deleted without damaging the data held against the other ontologies AND the patients need to be reloaded in the same order, to ensure that patients are allocated to the correct i2b2 identifiers.
     102Database preparation for a complete reload of Onyx data.
     103
     104Assuming all metadata is unchanged.
     105
     106Need to be sure that the patients are mapped consistently in the re-loading process. General approach is to delete all the data (patient dimension, observation facts, visit dimension) but not touch ANY metadata.
     107
     108
     109metadata database - NO CHANGES
     110
     111work database - NO CHANGES
     112
     113hive - NO CHANGES
     114
     115pm - NO CHANGES
     116
     117
     118data DATABASE TABLES:
     119
     120ARCHIVE_OBSERVATION_FACT
     121- Currently empty. Leave.
     122
     123CODE_LOOKUP
     124- Currently empty. Leave.
     125
     126concept_dimension
     127 - Pointers to ontology. Leave alone because ontology is not changing.
     128
     129DATAMART_REPORT
     130- Currently empty. Leave.
     131
     132Encounter_Mapping
     133- Currently empty. Leave.
     134
     135Observation_Fact
     136- All observations included, from all ontologies.
     137Selective delete.
     138Need to delete where Concept_Cd LIKE 'CBO:%'
     13919770 rows deleted.
     140ALSO THERE ARE FOUR ENTRIES WHERE Concept_Cd is empty. WHY?????? Delete them also.
     1414 rows deleted.
     142
     143Patient_Dimension
     144- All patients
     145Need to delete all.
     146110 rows deleted.
     147
     148Patient_Mapping
     149- All patients
     150Need to delete all.
     151330 rows deleted.
     152
     153Provider_Dimension
     154- Currently empty. Leave.
     155
     156QT_*
     157- Are related to queries. Leave.
     158
     159SET_TYPE
     160Index of 'set' types. Leave.
     161
     162SET_UPLOAD_STATUS
     163- Currently empty. Leave.
     164
     165SOURCE_MASTER
     166- Currently empty. Leave.
     167
     168UPLOAD_STATUS
     169- Currently empty. Leave.
     170
     171Visit_Dimension
     172Only Onyx data is loaded against 'visit dimension', one per patient. It is built by the import process. DELETE ALL.
     173110 rows deleted.
     174
     175
     176Looking to check which onyx export file(s) to use.
     177
     178procedures-1.0-SNAPSHOT-development-old alpha used test-BRICCS-20111021090857.zip, generated 8 patients.
     179
     180procedures-1.0-SNAPSHOT-development alpha used live-BRICCS-20111031180021.zip, generated 53 patients.
     181
     182procedures-1.0-SNAPSHOT-development beta used live-BRICCS-20111031184824.zip, generated 57 patients.
     183
     184procedures-development-trac92 used BRICCS-20110106095220.zip, generated 4 patients.
     185
     186procedures-trac108-SNAPSHOT_development ws-test used live-BRICCS-20111031184824.zip, generated 57 patients.
     187
     188Do two jobs: alpha and beta
     189
     190alpha: Use /home/nick/onyxexports/live-BRICCS-20111031180021.zip, generate 53 patients.
     191
     192beta: Use /home/nick/onyxexports/live-BRICCS-20111031184824.zip, generate 57 patients.
     193
     194This kind of worked, except that the TEST i2b2 on uhlbriccsapp02.xuhl-tr.nhs.uk had 110 patients loaded against pids from 2 to 111, and the above process re-loaded them against pids from 1 to 110. Oops. But either the other data (PATS and pathology) will be re-loaded against the new pids overnight, OR I can re-run this process tomorrow with pid=2 and eid=2 for the first batch.
     195
     196
     197= NOTE: PROCESS WILL BE DIFFERENT IF THE ONTOLOGY ITSELF IS CHANGED IN ANY WAY. =
     198