wiki:i2b2 Data Import Pathology Procedure v2

Version 10 (modified by Richard Bramley, 12 years ago) ( diff )

--

Back

Version 2 of the i2b2 pathology data integration load will also use SSIS, but will address the following issues with version 1.

  1. Address and document the correct procedure for dealing with duplicates, by getting Paul Smalley to define which items are true duplicates and may be discarded and identifying a means of distinguish between duplicates where both should be loaded into i2b2.
  2. Improve the error handling and reporting.
  3. Move the SSIS packages onto the BRICCS SQL Server so that we can control their running and updating.
  4. Be configurable so that the source and destination databases can be changed more easily.

Implementation

Version 2 of the i2b2 pathology data integration load is written as the SQL Server Integration Services (SSIS) packages i2b2ImportPathologyObservations.dtsx and i2b2BuildPathologyOntoloy.dtsx. The mapping, ontology and logging data is stored in the i2b2ClinDataIntegration database on the UHLSQLBRICCSDB\UHLBRICCSDB server.

i2b2ClinDataIntegration Database

The database contains the following tables used by the two pathology integration packages:

Table Description
PathologyHierarchy The hierarchy nodes used to build the tree structure in the pathology ontology
PathologyCodes The relevant LOINC codes used in the pathology ontology, linked to their place in the hierarchy
PathologyMapping The mapping of LOINC codes to iLab codes. Each LOINC code can be mapped to more than one iLab code, but each iLab Code can only be mapped to one LOINC code.
ETLHistory Log file showing when each job was last run for each destination database.
Failed_Observation_Fact Log file showing Observation_Fact records that could not be loaded into the Observation_Fact table, when they were loaded and what the error message was.

It also contains the following views

View Description
PathologyHierarchyRecursed Recurses the PathologyHierarchy table to produce a tree structure

i2b2BuildPathologyOntoloy SSIS Package

The package contains the following steps:

  1. Recreate Ontology in the i2b2 meta database.
    1. Delete existing ontology records in the PATHOLOGY table.
    2. Insert the new ontology hierarchy records created from the PathologyCodes table and PathologyHierarchyRecursed view in the i2b2ClinDataIntegration database into the PATHOLOGY table of the i2b2 meta database.
  2. Recreate the concept codes in the i2b2 data database.
    1. Delete existing concept records from the concept_dimension table in i2b2 data database.
    2. Insert the new concept records created from the PathologyCodes table and PathologyHierarchyRecursed view in the i2b2ClinDataIntegration database into the concept_dimension table of the i2b2 meta database.

i2b2ImportPathologyObservations SSIS Package

Before running the package checks the contents of the ETLHistory table in the i2b2ClinDataIntegration database and the results of the web service at address http://uhlidtapps.xuhl-tr.nhs.uk:1106/Service.asmx/outputXMLStatusInfo?systemlist to see if the data warehouse is up to date and has been updated since the last time the data was copied.

If the data warehouse is ready and does contain new data, the package executes the following steps:

  1. Create temporary tables for Patient and LOINC mappings in the DWBRICCS database and copy the mappings into them.
  2. Delete the existing pathology observations from the Observation_Fact table of the i2b2 data database.
  3. Create new pathology records in the Observation_Fact table of the i2b2 data database.
    1. Records are created by joining data tables in the data warehouse DWPATH database, with the mapping temporary tables created earlier in the job.
    2. Rows that produce an error are inserted into the Failed_Observation_Fact table in the i2b2ClinDataIntegration database.
  4. Drop temporary mapping tables.
  5. Log the job completion to the ETLHistory table in the i2b2ClinDataIntegration database.

Package Scheduling

  • The packages are run as SQL Server Agent jobs on the UHLSQLBRICCSDB\UHLBRICCSDB server.
  • The connection strings are overridden as command line arguments to point the job to the test databases as opposed to the development databases.
  • During the test phase, the jobs have not been set to run automatically.
Note: See TracWiki for help on using the wiki.