Version 11 (modified by 13 years ago) ( diff ) | ,
---|
Importing a PDO - the algorithms examined in detail
This is work in progress.
The context of this discussion is what happens to a PDO file when you ask the CRC loader to process the file using one of i2b2's web services. There are so many things happening here that throwing some light on the detail is important to understanding the process, and - by reflection - on how to format a PDO suitable for importing in the first place.
A lot of this detail can be gleaned from a careful reading of the CRC documentation, particularly those parts of the CRC Design and CRC Messaging pdf's covering the import use cases. But understanding is not easy. The details have been augmented by code reading the CRC loader.
The PDO (Patient Data Object)
The outline structure of a PDO matches the star-schema of the data mart and is given in the following:
<?xml version="1.0" encoding="UTF-8"?> <pdo:patient_data xmlns:pdo="http://www.i2b2.org/xsd/hive/pdo/1.1/pdo"> <!-- patient identifier set --> <pdo:pid_set> <!-- Identifies a patient in a source system --> <pid>... details here ...</pid> </pdo:pid_set> <!-- event identifier set --> <pdo:eid_set> <!-- Identifies an event/occurrence/visit in a source system --> <eid>... details here ...</eid> </pdo:eid_set> <pdo:patient_set> <!-- Basic patient details --> <patient>... details here ...</patient> </pdo:patient_set> <pdo:event_set> <!-- Basic event details --> <event>... details here ...</event> </pdo:event_set> <pdo:concept_set> <!-- Basic details of one concept --> <concept>... details here ...</concept> </pdo:concept_set> <pdo:observer_set> <!-- Basic observer/provider details --> <observer>... details here ...</observer> </pdo:observer_set> <pdo:observation_set> <!-- A single fact concerning one patient --> <observation>... details here ...</observation> </pdo:observation_set> </pdo:patient_data>
Order of Processing
The loader processes the PDO in the order displayed in the above XML skeleton; ie:
- pid_set
- eid_set
- patient_set
- event_set
- concept_set
- observer_set
- observation_set
Detailed processing will be reviewed later. For the moment, there are two observations I think need to be made:
- I believe that processing the pid_set and eid_set first in the work flow allows the loader to be in control in assigning i2b2 internal identifiers to patients and to events. That is: there is no need for i2b2 internal id's to somehow be manufactured and placed in the PDO beforehand. The source and source identifiers (eg: BRICCS participant id and/or s-number) can be used as patient identifiers and the loader will take care of assigning internal ids and mapping them to the external source ids. This is a big gain: the process is transactional, it is database independent and there are no problems with concurrency (multiple processes doing the same thing at the same time). However, although we know what a participant is, we are still somewhat in the dark concerning events: what is an event is in terms of a source system.
- All seven sets or some subset of the seven can be supplied. Even if all seven were supplied, the loader message itself (the web service message that triggers the load process) contains control data which can specify which sets of those present should be processed. The processing will always be done in the above order, even if it has gaps.