== Processing the Patient Identifer Set during a Load. ==
This is work in progress.
=== Cardinality, with some examples. ===
==== __Example One__ ====
First of all, this is wrong. Every pid must have a patient_id.
{{{
BPt00000040
Snnnnnnn
}}}
==== __Example Two__ ====
This is acceptable:
{{{
BPt00000040
Snnnnnnn
}}}
==== __Example Three__ ====
The following example is also acceptable, but implies we know the participant is either:
* already within the CRC (and that we know the internal identifier = 2) or:
* this is a new participant and we are ourselves assigning a new i2b2 internal identifier (= 2) for them.
Both situations are ones we can avoid by adopting the approach of Example Two above, and omitting the HIVE as a source.
{{{
2
BPt00000040
Snnnnnnn
}}}
==== __Comment__ ====
As far as I can tell, a row in the temporary table covers a patient_id / patient_map_id combination. So:
* Example Two would give rise to one row.
* Example Three would give rise to two rows.
=== First Stage: Eliminate Duplicates. ===
Any "duplicates" are eliminated from the temporary table.
A duplicate is one where another row matches on:
1. patient_id
1. patient_id source
1. patient_map_id
1. patient_map_id source
=== Second Stage: Process HIVE as a Source. ===
=== Third Stage: Not using HIVE as a Source. ===