Version 11 (modified by 9 years ago) ( diff ) | ,
---|
i2b2 - UHL clinical informatics system integration PATS v1
Version 1 of the i2b3 PATS data integration load uses a job on the UHL Data warehouse server to move data from the DWPATS database on the UHL data warehouse server to i2b2 database on the UHLSQLBRICCSDB\UHLBRICCSDB server, using the DWBRICCS database on the UHL data warehouse server as an intermediate.
Procedure
The job on UHL Data warehouse server contains 3 steps.
1. Update I2B2 Patients
Runs the stored procedure USP_DWH_IMPORT_BRICCS_PATIENTS in the DWBRICCS database on the UHL data warehouse, which copies patients from the destination database on the I2B2 database server to the DWBRICCS database using a linked SQL server.
2. Delete PATS Observations SSIS package
Steps:
- Deletes all records from the PATS_OBSERVATION_FACT staging table in the DWBRICCS database on the UHL data warehouse.
- Deletes PATS observations from the Observation_Fact table in the production i2b2 database with a concept code begining with 'PTS'.
3. Load_PATS_Observations SSIS package
This package performs a separate import for each field in the Data Dictionary table in the DWBRICCS database into the PATS_OBSERVATION_FACT staging table. There are two mechanisms depending on whether the field is a number or text. The complete contents of the staging table are then moved en masse into the Observation_Fact in the production i2b2 database.
This package actively ignores all records that fail when they are imported into the PATS_OBSERVATION_FACT staging table. This results in two types of records being ignored:
- Records for which the start date is NULL.
- Duplicate records.
Appropriateness of Columns
While invesigating a diference between data loaded onto the live site and the parallel run site, I noticed that for the table GGHPCI_FLWGGHPCIVIEW (which may not be useful anyway - see below) lots of fields were being loaded that looked irrelevant. For example: EntryID, FlwId, DateRecordCreated, DateTimeRecordLastChanged, EnteredByTag, etc.
ACTION (RB): Check all tables to see if fields being extracted look like they're irrelevant. Then check with Nick.
NULL Start Dates
An invetsigation showed 56 records in which the start date was NULL. All of these records were from one record on the EPS_EPSINIVW table. This might be an error or there may be a valid reason for this.
The field DateOfAdmission is used as the start date for the EPS_EPSINIVW table. Alternative fields for start date on this table are date_timeOfProcedure and procedureStartTime. Both of these, however, also have records where the date is NULL. All three fields could be coalesced to reduce the risk of an error.
ACTION (RB): Ask Claire Hutton if she deals with this data set. If she does, could she check that the start date is NULL in the PATS system. If it is, is there a valid reason for this?
ACTION (RB): As per duplicate, Nick says use COALESCE(ProcedureStartTime, Date_timeOfProcedure)
Duplicate Records
Records are duplicate if they have duplicate values for all of the fields Encounter Number, Concept Code, Provider ID, Start Date and Modifier ID, of which Provider ID and Modifier ID are not used. The key field is therefore start date, which has the following source for the records on which a duplicate was found:
Table | No. Duplicates | Start Date Field |
CATH_CATHINITIALVW | 5 | DateOfCatheter |
CS2003F_PostProcedure | 1 | DateOfDischargeOrDeath |
CS2010F_PostProcedure | 1 | DateOfDischargeOfDeath |
EPS_EPSINIVW | 1 | DateOfAdmission |
GGHPCI_FLWGGHPCIVIEW | 169 | DateOfFollowup |
GGHPCI_INITIALGGHPCIVIEW | 2 | DateOfProcedureCCAD |
Duplicate CATH_CATHINITIALVW investigation
Reasons:
- Procedure abandoned
- 2nd record has very little data
- Different procedure site
Duplicate CS2003F_PostProcedure investigation
Reason:
- 2 procedures. One of which has very little data.
DateOfDischargeOrDeath is the only relevant date against the record. However, there is a days post op field that is different for both records which could be used to calculate a date. However, this could still produce duplicates.
Duplicate CS2010F_PostProcedure investigation
Same reason and patient as for CS2003F_PostProcedure.
ACTION (RB): Why is the data for this procedure in both CS2003 and CS2010? It was thought that this data was split chronologically between these two tables. I.e, 2003-2010 in CS2003, 2010- in CS2010.
ACTION (RB): Is there any way to get the true date of procedure? Ask Claire if she knows?
Duplicate EPS_EPSINIVW investication
Reason:
- Field 'IsThisARe_Do' is set to yes.
The field DateOfAdmission is used as the start date for the EPS_EPSINIVW table. Alternative fields for start date on this table are date_timeOfProcedure and procedureStartTime. Both of these, however, also have records where the date is NULL. All three fields could be coalesced to reduce the risk of an error.
ACTION (RB): Nick says use COALESCE(ProcedureStartTime, Date_timeOfProcedure)'
Duplicate GGHPCI_FLWGGHPCIVIEW investigation
- Status changed
This looks like a table where they can go in and change the status of a previous procedure.
The dates look like they might be from an import, so this might explain it?
ACTION (RB): Check with Claire what is being monitored in this table. If nothing important, Nick is inclined to stop importing it.
Duplicate GGHPCI_INITIALGGHPCIVIEW investigation
- More than one procedure on the same day for both duplicates.
DateAndTimeOfOperation seems to be unique.
ACTION (RB): Check with Claire that DateAndTimeOfOperation means what it sounds like.