Processing the Patient Identifer Set during a Load.
This is work in progress.
Cardinality, with some examples.
Example One
First of all, this is wrong. Every pid must have a patient_id.
<pid> <patient_map_id source="BRICCS">BPt00000040</patient_map_id> <patient_map_id source="UHLT">Snnnnnnn</patient_map_id> </pid>
Example Two
This is acceptable:
<pid> <patient_id source="BRICCS">BPt00000040</patient_id> <patient_map_id source="UHLT">Snnnnnnn</patient_map_id> </pid>
Example Three
The following example is also acceptable, but implies we know the participant is either:
- already within the CRC (and that we know the internal identifier = 2) or:
- this is a new participant and we are ourselves assigning a new i2b2 internal identifier (= 2) for them.
Both situations are ones we can avoid by adopting the approach of Example Two above, and omitting the HIVE as a source.
<pid> <patient_id source="HIVE">2</patient_id> <patient_map_id source="BRICCS">BPt00000040</patient_map_id> <patient_map_id source="UHLT">Snnnnnnn</patient_map_id> </pid>
Comment
As far as I can tell, a row in the temporary table covers a patient_id / patient_map_id combination. So:
- Example Two would give rise to one row.
- Example Three would give rise to two rows.
First Stage: Eliminate Duplicates.
Any "duplicates" are eliminated from the temporary table. A duplicate is one where another row matches on:
- patient_id
- patient_id source
- patient_map_id
- patient_map_id source
Second Stage: Process HIVE as a Source.
Third Stage: Not using HIVE as a Source.
Last modified
9 years ago
Last modified on 12/09/15 19:44:00
Note:
See TracWiki
for help on using the wiki.