wiki:i2b2 Data Import PATS Mapping v1

Version 53 (modified by Richard Bramley, 12 years ago) ( diff )

--

Back

The PATS mappings are split into text and numeric fields.

Text Fields

Text fields are mapped to the i2b2 Observation Fact table like this

i2b2 FieldSource
Encounter_Numi2b2 Patient ID
Patient_Numi2b2 PatientID
Concept_CdEnum lookup value (see below)
Provider_Id"@"
Start_DateThe source table observation date field (see below)
Modifier_Cd"@"
ValType_Cd"T" (text)
Units_Cd"@"
Location_Cd"@"
Update_DateThe current date using the GETDATE() Function
SourceSystem_Cd"BRICCS"
Upload_Id"1"

SOMETIME (RB): Convert Update_Date to use a date parameter. It's just a little quicker.

SOMETIME (RB): Should the SourceSystem_Cd not be "PATS"? Or maybe "MINAP" or whatever.

SOMETIME (RB): The Upload_Id is presumably always one because everything is reloaded each time. Might be nice to change this to something more meaningful.

Enum Lookup Value

The Concept Code to use is found in the table DATA_DICTIONARY_ENUM_VALUES in the DWBRICCS database on UHLDWH server. This table contains records for concept code keyed on source table name, column name and column value. NULL column values may be mapped to the field '(empty)'.

ACTION (RB): Check if all possible values of column value contain a mapping in the DATA_DICTIONARY_ENUM_VALUES table.

Source Table Observation Date Field

For each source table in the there is a single field that is used as the start date for all columns from that field. The observation date field to use for each table is stored in the pats_tables table in the DWBRICS on the UHLDWH server.

Numeric Fields

Numeric fields are mapped to the i2b2 Observation Fact table like this

i2b2 FieldSource
Encounter_Numi2b2 Patient ID
Patient_Numi2b2 PatientID
Concept_Cd"PTS:{TABLE_NAME}_{COLUMN_ID}"
Provider_Id"@"
Start_DateThe source table observation date field (see above)
Modifier_Cd"@"
ValType_Cd"N" (numeric)
TVal_Char"E" (see below)
Units_CdValue calculated by ufn_get_units function (see below)
Location_Cd"@"
Update_DateThe current date using the GETDATE() Function
SourceSystem_Cd"BRICCS"
Upload_Id"1"

TVal_Char

The TVal_Char describes the meaning of the value contained in the NVal_Num field.

For all values currently the value of TVal_Char is set to "E", which means equals. That is, the value in the NVal_Num field is the result of the test. All possible values of TVal_Char are:

ValueMeaning
EEquals
NENot Equal
LLess Than
LELess Than or Equals
GGreater Than
GEGreater Than or Equals

From a casual look at the data, I think that a value of less than X, or greater than Y is sometimes used for Pathology data.

ufn_get_units Function

The function ufn_get_units in the DWBRICCS database is called with parameters of table_name and field_name to determine the units.

ACTION (RB): Work out what this function actually does. The function just selects the units column from the Data_Dictionary table in the DWBRICCS database. If the result is an empty string, it returns 'units' instead. Don't know why it would do that.

ACTION (RB): Rewrite numeric data extract to do the same, but with a join. Or it could be moved into the query that copies all the numeric fields into the recordset for looping with the for each. Either really.

Tables

The PATS i2b2 data integration load maps source table types to either Text or Numeric. All other types are ignored. The mapping is as follows.

Source Typei2b2 Type
VARCHARText
INT or FLOATNumeric
All other typesNOT LOADED

In addition, all fields with certain names are also ignored.

Ignored Field Names

  • PatientId
  • EntryId
  • PreviousEntry
  • NextEntry

Comments about data:

COMMENT (RB): Which fields should be removed?

COMMENT (RB): All date fields are ignored, but could they be converted into numeric fields. For example, length of stay, time between stays, etc. This may already be being done. Nicks says that generally this information is not important for cohorting.

COMMENT (RB): There are quite a few denormalised fields. For example: drug1, drug2 - could these be normalised into one concept code. Or: readmittedDate1, readmittedDate2 - could these be different encounters? Are these already being recorded?

COMMENT (RB): Some dates are recorded as varchars and so presumably are being imported

COMMENT (RB): Are some text fields also being ignored because they don't have entries in the enum data dictionary? Yes they are ignored.

ACTION (RB): Rewrite extracting of varchar fields to check that there are any entries in the ENUM Data Dictionary for that field. This should reduce the number of queries that need to be run in the for each and speed things up quite a bit.

Registries

Myocardial Ischaemia National Audit Project (MINAP)

Minap data is contained within the following mapped tables in the data warehouse

And the following unmapped tables

These tables should be combined into one mapping: Combined Minap

Cardiac Surgery? (CS)

This information is recorded at 4 times: pre-procedure, post-procedure, sub-procedure and follow up. There are several candidate tables for each of these.

Pre-Procedure

Sub-Procedure

Post-Procedure

Follow Up

CATH_CATHINITIALVW

EPS_EPSINIVW

GGHPACEICD_GGHPACEICDINIALVW

GGHPCI_FLWGGHPCIVIEW

GGHPCI_INITIALGGHPCIVIEW

VERTED_VERTEDINIALVW

Note: See TracWiki for help on using the wiki.