wiki:CiviCRM Module HSCIC Importer Planning

Version 5 (modified by Richard Bramley, 11 years ago) ( diff )

--

CiviCRM GP/Practice Update Process

A proposed process to automatically import and update GP and GP Practice details.

Options

  1. Import data from HSCIC web site.
  2. Import the data from the UHL data warehouse.

1. HSCIC

This is the preferred method.

The data files are downloadable from the following page: http://systems.hscic.gov.uk/data/ods/datadownloads/gppractice.

New files are release quarterly, but update files are released monthly (http://systems.hscic.gov.uk/data/ods/datadownloads/monthamend/index_html)

There is a 27 column 'standard' format for the data, as follows:

  • epraccur.zip - GP practice current data *epraccur.csv
    • Fields:
      • Organisation code
      • Practice Name
      • National Grouping
      • High Level Health Authority
      • Address line 1
      • Address line 2
      • Address line 3
      • Address line 4
      • Address line 5
      • Postcode
      • Open date
      • Close date
      • Status (A = Active, C = Closed, D = Dormant, P = Proposed)
      • Sub-type code (B = Allocated to a parent organisation, Z = Not allocated to a parent organisation)
      • Parent Organisation code (CCG/PCT etc code)
      • Join parent date
      • Left parent date
      • Telephone number
      • Null
      • Null
      • Null
      • Amended record indicator
      • Null
      • Null
      • Null
      • Practice Type (0 = Other, 1 = WIC Practice, 2 = OOH Practice, 3 = WIC + OOH Practice, 4 = GP Practice, 5 = Prison prescribing cost centre)
      • Null
  • ebranchs.zip - Branch surgery data *ebranchs.csv
    • Fields:
      • Organisation code (made up of the surgery code plus three digits - 001, 002, etc - to denote a branch surgery)
      • Branch surgery Name
      • National Grouping
      • High Level Health Authority
      • Address line 1
      • Address line 2
      • Address line 3
      • Address line 4
      • Address line 5
      • Postcode
      • Open date
      • Close date
      • Null
      • Null
      • Parent Organisation code (GP surgery code)
      • Join parent date
      • Left parent date
      • Telephone number
      • Null
      • Null
      • Null
      • Amended record indicator
      • Null
      • Government Office Region Code
      • Null
      • Null
      • Null
  • egpcur.zip - GP current data *egpcur.csv
    • Fields:
      • G code
      • Name (surname space initials)
      • National Grouping
      • High Level Health Authority
      • Address line 1
      • Address line 2
      • Address line 3
      • Address line 4
      • Address line 5
      • Postcode
      • Open date
      • Close date
      • Status (A = Active, C = Closed, P = Proposed)
      • Sub-type code (P = Principal GP / Senior partner, O = Other GP)
      • Parent Organisation code (GP surgery code)
      • Join parent date
      • Left parent date
      • Telephone number
      • Null
      • Null
      • Null
      • Amended record indicator
      • Null
      • Current care organisation
      • Null
      • Null
      • Null

The monthly update file (egpam.zip -> egpam.csv) is an amalgamation of entries in both the above two formats into a single file for GP and GP practice data. Updated branch data is in ebranchsam.csv which is contained in eamendam.zip each month.

Note that addresses are 'unstructured' other than postcode. We could replicate the address matching approach we've implemented for the NIHR BioResource module, but (a) beware of google maps API limit and (b) the street address for practices rarely begins with a number, so is less predictable. A reliable source of address data is becoming a higher priority.

Process

Match each primary practice with a record in CiviCRM - update details, ensure 'main' address matches or is updated Match each branch surgery with an address in CiviCRM of type 'other' Match each GP to a health worker record in CiviCRM, ensure relationship links to correct GP Practice. Include senior partner / principal GP relationship.

How to deal with archive data? Do we care? For the time being, assume we don't care. What matters to us is a currently-viable record of each practice so that we can construct mailing lists, etc., not a historically accurate record of all changes.

This requires an amendment to the CiviCRM object model for Practice addresses - an additional item of custom data for the 'organisation code' which comprises the practice code plus a three digit identifier. This will be optional - it only has relevance for branch surgery addresses, not main addresses.

See https://api.drupal.org/api/drupal/modules%21system%21system.api.php/function/hook_cron/7 for details on using hook_cron to schedule the work.

To begin with, we should limit our work to loading in GP surgeries, GPs themselves, and addresses / telephone numbers for GP surgeries and branch surgeries. Links to health authorities and other entitities would be possible, but is outside of scope for the time being.

Psuedocode

wget monthly GP + practice amendment file
compare to last version held
if the same, delete
if different, unpack zip file
process csv file

wget monthly branch amendment file
compare to last version held
if the same, delete
if different, unpack zip file
process csv file

Usage of Drupal cron

If we use the Drupal 'easy cron' then all the processing is done in within a normal request for a Drupal page. This could lead to a user inexplicably to have to wait a long time for a page because we're doing the processing in the background.

We could split the task ofver several requests. Either manually or using queues. However, the actual act of downloading the file could take a long time on its own.

We could therefore use real cron to download the file. Or just use real cron to do the whole process.

2. UHL Data Warehouse

The data is stored in the DWREPO_BASE database. The tables are:

  • MF_GP_OCS (GPs)
  • MF_GP_PRACTICE_OCS (Practices)

There are other tables with similar names, but I don't know how they differ from the ones above.

It may be possible to use the SHA codes of the GPs to filter the details for the Leicestershire area.

These tables are recreated from scratch on a weekly basis, but the source from which they are created is the quarterly file above.

Note: See TracWiki for help on using the wiki.