wiki:LEGACY - ProcessingPidSet

Version 8 (modified by jeff.lusted, 13 years ago) ( diff )

--

Processing the Patient Identifer Set during a Load.

This is work in progress.

Cardinality, with some examples.

Example One

First of all, this is wrong. Every pid must have a patient_id.

<pid>
   <patient_map_id source="BRICCS">BPt00000040</patient_map_id>
   <patient_map_id source="UHLT">Snnnnnnn</patient_map_id>
</pid>

Example Two

This is acceptable:

<pid>
   <patient_id source="BRICCS">BPt00000040</patient_id>
   <patient_map_id source="UHLT">Snnnnnnn</patient_map_id>
</pid>

Example Three

The following example is also acceptable, but implies we know the participant is either:

  • already within the CRC (and that we know the internal identifier = 2) or:
  • this is a new participant and we are ourselves assigning a new i2b2 internal identifier (= 2) for them.

Both situations are ones we can avoid by adopting the approach of Example Two above, and omitting the HIVE as a source.

<pid>
   <patient_id source="HIVE">2</patient_id>
   <patient_map_id source="BRICCS">BPt00000040</patient_map_id>
   <patient_map_id source="UHLT">Snnnnnnn</patient_map_id>
</pid>

Comment

As far as I can tell, a row in the temporary table covers a patient_id / patient_map_id combination. So:

  • Example Two would give rise to one row.
  • Example Three would give rise to two rows.

First Stage: Eliminate Duplicates.

Any "duplicates" are eliminated from the temporary table. A duplicate is one where another row matches on:

  1. patient_id
  2. patient_id source
  3. patient_map_id
  4. patient_map_id source

Second Stage: Process HIVE as a Source.

Third Stage: Not using HIVE as a Source.

Note: See TracWiki for help on using the wiki.