wiki:Onyx Export and Purge

Version 19 (modified by jeff.lusted, 13 years ago) ( diff )

--

Onyx Export and Purge

Sources of Info

The Onyx User Guide has a useful Chapter 12 "Topics for System Administrators" with details of the export and purge functions.

Other useful links to the obiba wiki

Overview

Exporting data from Onyx means reading data from the Onyx database and writing it to one or more export destinations. Exporting does not delete any data from the Onyx database. Deleting data from the database is done by the purge function. An export destination is a compressed zip file. Participant data and experimental conditions data can be exported. Configuration of data export is done entirely in configuration files, not through the Onyx user interface. Some things that can be configured:

  • Which data is selected for export
  • Directory to which export files are written
  • How many export destinations are defined

System administrators trigger an export via the Onyx web interface. The configuration file controls everything else. It is NOT possible from within the web interface to choose which participants will be exported. Any selective export can only be tailored via the configuration file.

Purging data means deleting data from the Onyx database. Only participant data can be purged — not experimental conditions data. Configuration of data purging is done entirely in configuration files, not through the Onyx user interface. As per data export, only a system administrator can execute a purge of data by a function from within the user interface.

Sample Config Export File and resulting Export Zip File

The following represent an export of only four participants. The zip file contains a lot of xml. It's worth opening and just pondering how we might approach this. Virtually everything is captured regarding a participant and the interview process. How much of this do we want in i2b2?

Export destinations file
Resulting export zip file

How much do we want to export and purge?

It looks as if the export config file attached results in almost all data being exported for those participants whose interview status is closed, completed or cancelled.

Some aspects are excluded which I (Jeff) do not fully comprehend:

  1. Notably to do with the variable 'Participants:Admin.Interview.exportLog.destination'.
  2. Some aspects of the Consent table are not exported.

The question of how much we purge is an open question. Remember that this may affect the reporting tool. I mention this here because I believe the purge config file that we have as a default will result in almost all participant data going that does not have an open status.

On the whole it seems sensible to export as much as we can and then archive export files; ie: retain them forever. We may wish to consider encryption given the idea of retaining in perpetuity.

Why export everything? Because it gives us more than one bite of the cherry for the import into i2b2 (or any other piece of software). The detail shown in the export file is quite daunting. It is conceivable that if we filtered during the export we might get this wrong, or change our minds later.

Filtering the exported data

All of the exported data is in XML format. Given the large amount of detail exported, we need some way of marshalling this into a somewhat simpler form prior to organizing it for import into i2b2.

The idea is to come up with a programmable process (an automated process) that will act as a first filter which can be applied to all exports.

Whatever process we come up with, it is likely to be a process with a number of steps, and we are unlikely to get it correct first time. The process will be one involving manual inspection of example files from within an export zip file together with some programming to decide on a what data can be eliminated.

The manual inspection is the thinking bit. Don't jump to conclusions on first inspection.

For instance, this is an extract from a Participant's file...

  <variableValue variable="Admin.Action.fromState">
    <value class="sequence" valueType="text" size="40">
      <value valueType="text" order="0"/>
      <value valueType="text" order="1">Ready</value>
      <value valueType="text" order="2">InProgress</value>
      <value valueType="text" order="3">Ready</value>
      <value valueType="text" order="4">InProgress</value>
     
... similar lines removed ...

      <value valueType="text" order="36">Interrupted</value>
      <value valueType="text" order="37">InProgress</value>
      <value valueType="text" order="38">Ready</value>
      <value valueType="text" order="39">InProgress</value>
    </value>
  </variableValue>

It's probable in my judgement that this could be filtered out. But what about:

<variableValue variable="Admin.StageInstance.user">
    <value class="sequence" valueType="text" size="14">
      <value valueType="text" order="0">JeffLusted</value>
      <value valueType="text" order="1">JeffLusted</value>
      <value valueType="text" order="2">JeffLusted</value>

... similar lines removed ...

      <value valueType="text" order="11">JeffLusted</value>
      <value valueType="text" order="12">JeffLusted</value>
      <value valueType="text" order="13">JeffLusted</value>
    </value>
  </variableValue>

Experiment with Exclusion at the Entities Level

Altered the export-destinations.xml file so the type="EXCLUDE" scripts were commented out throughout the file. The following is just the first instance of this:

    <valueset entityType="Participant" valueTable="Participants">
      <entities>
        <excludeAll />
        <script type="INCLUDE">
          <javascript><![CDATA[// Include any ValueSet that has 'CLOSED' or 'COMPLETED' or 'CANCELLED' as a value for the 'Participant.Interview.Status' variable
          $('Participants:Admin.Interview.status').any('CLOSED','COMPLETED','CANCELLED')]]></javascript>
        </script>
        <!-- script type="EXCLUDE">
          <javascript><![CDATA[$('Participants:Admin.Interview.exportLog.destination').any('BRICCS.Participants')]]></javascript>
        </script -->
      </entities>
    </valueset>

I then ran an export. The export produced a zip file containing all the participants that had a completed status on my test system, even those that had been previously exported. So the conclusion is that the exclude condition above ensures that duplication of exported participants does NOT take place. As an aside, there are no participants on my test system with a closed or cancelled status.

Attachments (2)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.