wiki:Initial Incremental Export

Version 11 (modified by Nick Holden, 13 years ago) ( diff )

--

In October 2011 we began work exporting data from Onyx for the first time.

Recognising that data was likely to be incorrect at least in some way this is not considered to be the definitive export, and NO PURGE WILL TAKE PLACE.

To manage the load on the server, and limit file sizes, the plan is to manage incremental exports up to date, recognising that this process might need to be repeated later.

The task of splitting the export into increments is managed in the export-destinations.xml configuration file, as described in the Onyx Export and Purge page.

Each incremental export will require a specific export-destinations.xml file with inclusion /exclusion criteria tailored to suit. Each stage in the interview needs to be specifically referenced with a stanza in the export-destinations.xml file. Once the export-destinations.xml file is edited to suit, it must be installed on the server, and tomcat restarted prior to entering Onyx to begin the export.

Data is exported as a timestamped zip file into a destination directory on the server. Each one will then be copied into a remote directory on the UHL BRICCS share, before beginning the next export.

Strategy is to conduct exports as follows:

Include only interviews with the status COMPLETE, exclude any previously exported.

Do NOT encrypt the export file, as it won't be leaving the UHL network.

For each increment then have a time window based on the Recruitment Context stage, and specifically the timeStart attribute of the QuestionnaireRun.

First run limit this to January through to June of the year 2010:

       <script type="INCLUDE">
          <javascript>
            <![CDATA[($('Participants:Admin.Interview.status').any('COMPLETED')).and($('RecruitmentContextQuestionnaire:QuestionnaireRun.timeStart').year().trim().any('2010')).and($('RecruitmentContextQuestionnaire:QuestionnaireRun.timeStart').month().trim().any('0', '1', '2', '3', '4', '5'))]]>
          </javascript>
        </script>
        <script type="EXCLUDE">
          <javascript>
            <![CDATA[$('Participants:Admin.Interview.exportLog.destination').any('BRICCS.Participants')]]>
          </javascript>
        </script>

Second run, extend the month limit through to 6 = July

Third run, extend the month limit through to 7 = August

Fourth run, extend the month limit through to 8 = September

Fifth run, extend the month limit through to 9 = October

Sixth run, extend the month limit through to 10 = November

Seventh run, remove the month limit so that any date in 2010 matches

Eighth run, remove the year limit, and set month limit back to 0 = January (should include only January 2011 participants)

Ninth run, extend the month limit through to 1 = February

Tenth run, extend the month limit through to 2 = March

Eleventh run, extend the month limit through to 3 = April

Twelfth run, extend the month limit through to 4 = May

Thirteenth run, extend the month limit through to 5 = June (will start to possibly pick up 2010 cases again, but not if previously exported)

Fourteenth run, extend the month limit through to 6 = July

Fifteenth run, extend the month limit through to 7 = August

Sixteenth run, extend the month limit through to 8 = September

Seventeenth run, extend the month limit through to 9 = October

Eighteenth run, remove both the year and month limits. Should mop up all remaining unexported interviews.

On completion of the 18 runs, the total number of exported interviews should match the total number of COMPLETE interviews in Onyx.

Notes of the actual export process

First increment export-destinations.xml version created in svn

First increment export-destinations.xml version copied to data.briccs.org.uk/onyx-config/...

First increment export-destinations.xml version downloaded to the test onyx server and export run successfully.

First increment export-destinations.xml version downloaded to the live onyx server and export run. No errors reported.

First increment export copied to V:\Test Data Export from Live Onyx\BRICCS-20111031170159.zip

26 participants marked as exported in Onyx interface. 26 xml files in each directory of the export. 26 entities.

There were 27 participants interviewed in June 2010, but one is still 'in progress', so 26 exported is correct.

Second version created in svn, committed, copied to data.briccs.org.uk, downloaded to live onyx server, tomcat restarted, and export run.

Expected result for July 2010 export: 53 participants. 53 participants exported.

PROBLEM: Exporting to /tmp on the live system is a disaster - the /tmp/tomcat6 folder gets destroyed when tomcat is restarted. We need a different configuration for tomcat - this does not happen on the test server. For the time being, I am copying each of the export files to /home/nick/briccs/onyx-export/ and also remotely, but we need a permanent solution.

Third version created in svn, committed, copied to data.briccs.org.uk, downloaded to live onyx server, tried a different approach (stopping and starting the briccs-onyx app, rather than the whole tomcat server), export run. Expected output: 53 participants. Actual output: 53 participants. And, yes, they are different from the July 53. This approach seems to work.

Fourth version created in svn, committed, copied to data.briccs.org.uk, downloaded to live onyx server, stopped and started the briccs-onyx app, ran the export. Expected output: 57 participants. Actual output: 57.

Fifth version created in svn, committed, copied to data.briccs.org.uk, downloaded to live onyx server, stopped and started the briccs-onyx app, ran the export. Expected output: 46 participants. Actual output: 46.

Sixth version created in svn, committed, copied to data.briccs.org.uk, downloaded to live onyx server, stopped and started the briccs-onyx app, ran the export.

Seventh version created in svn, committed, copied to data.briccs.org.uk, downloaded to live onyx server, stopped and started the briccs-onyx app, ran the export.

PROBLEM: At this point, tomcat failed to restart the briccs-onyx app - it was the PermGen error. I shutdown the tomcat6 service and restarted it, and recreated the /tmp/tomcat6/target directory again. Note that monitoring the server using htop doesn't illustrate the problem, as the PermGen space limit is hit long before any system impact is noted.

Eighth version created in svn, committed, copied to data.briccs.org.uk, downloaded to live onyx server, stopped and started the tomcat6 service (see above), recreated target directory, ran the export.

Ninth version created in svn, committed, copied to data.briccs.org.uk, downloaded to live onyx server, stopped and started the briccs-onyx app, ran the export.

Tenth version created in svn, committed, copied to data.briccs.org.uk, downloaded to live onyx server, stopped and started the briccs-onyx app, ran the export.

Eleventh version created in svn, committed, copied to data.briccs.org.uk, downloaded to live onyx server, stopped and started the briccs-onyx app, ran the export.

Note: See TracWiki for help on using the wiki.