wiki:i2b2 - UHL clinical informatics system integration PATS v1

Version 1 (modified by Richard Bramley, 11 years ago) ( diff )

--

back

Version 1 of the i2b3 PATS data integration load uses a job on the UHL Data warehouse server to move data from the DWPATS database on the UHL data warehouse server to i2b2 database on the UHLSQLBRICCSDB\UHLBRICCSDB server, using the DWBRICCS database on the UHL data warehouse server as an intermediate.

Procedure

The job on UHL Data warehouse server contains 3 steps.

1. Update I2B2 Patients

Runs the stored procedure USP_DWH_IMPORT_BRICCS_PATIENTS in the DWBRICCS database on the UHL data warehouse, which copies patients from the destination database on the I2B2 database server to the DWBRICCS database using a linked SQL server.

2. Delete PATS Observations SSIS package

Steps:

  1. Deletes all records from the PATS_OBSERVATION_FACT staging table in the DWBRICCS database on the UHL data warehouse.
  2. Deletes PATS observations from the Observation_Fact table in the production i2b2 database with a concept code begining with 'PTS'.

3. Load_PATS_Observations SSIS package

This package performs a separate import for each field in the Data Dictionary table in the DWBRICCS database into the PATS_OBSERVATION_FACT staging table. There are two mechanisms depending on whether the field is a number or text. The complete contents of the staging table are then moved en masse into the Observation_Fact in the production i2b2 database.

This package actively ignores all records that fail when they are imported into the PATS_OBSERVATION_FACT staging table. This results in two types of records being ignored:

  1. Records for which the start date is NULL.
  2. Duplicate records.

NULL Start Dates

An invetsigation showed 56 records in which the start date was NULL. All of these records were from one record on the EPS_EPSINIVW table. This might be an error or there may be a valid reason for this.

The field DateOfAdmission is used as the start date for the EPS_EPSINIVW table. Alternative fields for start date on this table are date_timeOfProcedure and procedureStartTime. Both of these, however, also have records where the date is NULL. All three fields could be coalesced to reduce the risk of an error.

Duplicate Processing

Version one of the data load identifies some records as being duplicates because they have the same patient, sample collection datetime and concept code. When a duplicate is identified it discards the most recent record. This is probably not correct for several reasons:

  1. If there are more that two duplicates, it only discards one record and so there will still be a duplicate.
  2. Common sense and reason 1 suggest that it should be keeping the most recent record.
  3. There may be a better way to identify which record is correct. For example, if the result has been suppressed (result suppression will not solely solve the problem).
  4. Both records may be valid.

Paul Smalley has agreed to look at the duplicate records to find out the reasons for the duplication.

Note: See TracWiki for help on using the wiki.