== Onyx Export and Purge ==
=== Sources of Info ===
The Onyx User Guide has a useful Chapter 12 "Topics for System Administrators" with details of the export and purge functions.
* The User Guide can be found here http://wiki.obiba.org/confluence/display/ONYX/Onyx+User+Guide
Other useful links to the obiba wiki
* Configuring export and purge http://wiki.obiba.org/confluence/display/ONYX16x/Configuring+Data+Export+and+Purge
* Data Exportation from Onyx. Interesting, but don't know how up to date: http://wiki.obiba.org/confluence/display/ONYX/Data+Exportation+from+Onyx
* Onyx Variables: http://wiki.obiba.org/confluence/display/ONYX/Onyx+Variables
=== Overview ===
'''Exporting data''' from Onyx means reading data from the Onyx database and writing it to one or more export destinations. Exporting does not delete any data from the Onyx database. Deleting data from the database is done by the purge function. An export destination is a compressed zip file. Participant data and experimental conditions data can be exported. Configuration of data export is done entirely in configuration files, not through the Onyx user interface. Some things that can be configured:
* Which data is selected for export
* Directory to which export files are written
* How many export destinations are defined
System administrators trigger an export via the Onyx web interface. The configuration file controls everything else. It is __NOT__ possible from within the web interface to choose which participants will be exported. Any selective export can only be tailored via the configuration file.
'''Purging data''' means deleting data from the Onyx database. Only participant data can be purged — not experimental conditions data.
Configuration of data purging is done entirely in configuration files, not through the Onyx user interface. As per data export, only a system administrator can execute a purge of data by a function from within the user interface.
=== Sample Config Export File and resulting Export Zip File ===
The following represent an export of only four participants. The zip file contains a lot of xml. It's worth opening and just pondering how we might approach this. Virtually everything is captured regarding a participant and the interview process. How much of this do we want in i2b2?[[BR]]
[raw-attachment:export-destinations.xml Export destinations file][[BR]]
[raw-attachment:BRICCS-20110106095220.zip Resulting export zip file]
=== How much do we want to export and purge? ===
It looks as if the export config file attached results in almost all data being exported for those participants whose interview status is closed, completed or cancelled.[[BR]]
Some aspects are excluded which I (Jeff) do not fully comprehend, notably of the Consent table.
The question of how much we purge is an open question. Remember that this may affect the reporting tool. I mention this here because I believe the purge config file that we have as a default will result in almost all participant data going that does not have an open status.[[BR]]
On the whole it seems sensible to export as much as we can for each participant and then archive export files; ie: retain them forever. Do we wish to consider encryption given the idea of retaining in perpetuity. I hope the answer to this is "No"; it's simply more work.[[BR]]
Why export everything for each participant? Because it gives us more than one bite of the cherry for the import into i2b2 (or any other piece of software). The detail shown in the export file is quite daunting. It is conceivable that if we filtered the export we might get this wrong, or change our minds later. Keeping track of what has and has not been exported could be a nightmare.
=== Filtering the exported data ===
All of the exported data is in XML format. Given the large amount of detail exported, we need some way of marshalling this into a somewhat simpler form '''''prior''''' to organizing it for import into i2b2.[[BR]]
The idea is to come up with a programmable process (an automated process) that will act as a first filter which can be applied to all exports.[[BR]]
Whatever process we come up with, it is likely to be a process with a number of steps, and we are unlikely to get it correct first time. The process will be one involving manual inspection of example files from within an export zip file together with some programming to decide on a what data can be '''''eliminated'''''.[[BR]]
( Since the above was written the i2b2-procedures project within SVN has brought together a workflow of job steps and programmes which form a prototype of this process. The latest is on branch svn+ssh://svn.briccs.org.uk/var/local/briccs/svn/repo/i2b2/branches/i2b2-procedures-jl-trac92 and is still being worked upon. )
The manual inspection is the thinking bit. Don't jump to conclusions on first inspection.[[BR]]
For instance, this is an extract from a Participant's file...
{{{
Ready
InProgress
Ready
InProgress
... similar lines removed ...
Interrupted
InProgress
Ready
InProgress
}}}
It's probable in my judgement that this could be filtered out. But what about:
{{{
JeffLusted
JeffLusted
JeffLusted
... similar lines removed ...
JeffLusted
JeffLusted
JeffLusted
}}}
=== Experiments with Exclusion ===
Altered the export-destinations.xml file so the type="EXCLUDE" scripts were commented out throughout the file. The following is just the first instance of this:
{{{
}}}
I then ran an export. The export produced a zip file containing all the participants that had a completed status on my test system, even those that had been previously exported. So the conclusion is that the exclude condition above ensures that duplication of exported participants does __NOT__ take place. As an aside, there are no participants on my test system with a closed or cancelled status.
[[BR]][[BR]]
I assume that there are a number of ways of achieving the same result using !JavaScript. On the Obiba web site, the equivalent of the above is given by:
{{{
...
}}}
The export-destinations.xml configuration file that shipped with the Briccs questionnaire has the following:
{{{
}}}
I conducted another experiment whereby the above variables section was commmented out, and then ran an export. I could detect no difference in the export zip files comparing before and after the change. But note:
* This is only with very limited test data
* I'm not sure how many valueTables are associated with consent. Are there separate valueTables for Consent, !ManualConsentQuestionnaire and !VerbalConsentQuestionnaire? If so the latter two have been omitted from our configuration file.
=== Some thoughts on the Experiments ===
As per our existing configuration file, it looks as if we are currently set up to export all participants where the Briccs questionnaire has been closed, completed or cancelled. We may want to alter this so only a completed status is taken into account.[[BR]]
On my laptop (dual core, 64bit with 1GB available to the JVM running Tomcat) and with limited test data, an export of 7 participants with a completed status took 1 minute and 24 seconds overall. On average, each participant added about 1MB to the uncompressed contents of the zip file.[[BR]]
On the whole, I would still like to think we could make small exports, say of 50 participants at a time, even though we may have seven hundred or more complete. This would mean experimenting with the !JavaScript conditioning. I think it would make for easier testing and debugging.[[BR]]
Another awkwardness is that once a participant is exported, it is only possible (as far as I can see) to look at a limited amount of their data via the Onyx web interface. So validating what is at the end of the Onyx-to-i2b2 trail by manual inspection will be difficult.