wiki:OnyxDatabaseReporting

Version 19 (modified by Nick Holden, 11 years ago) ( diff )

--

MySQL snippets and other material relating to reporting from the Onyx database.

In the early phase of BRICCS, before i2b2 is deployed, obtaining data on BRICCS participants is possible only from within the Onyx database, via MySQL queries.

There is a web page for reporting from the Onyx database, accessible via password within the UHL network, contact Nick or Vasil for information. In addition, the following is provided as useful material to assist when writing custom MySQL queries for reporting:

Table Structure

mysql> describe participant;
+------------------+--------------+------+-----+---------+----------------+
| Field            | Type         | Null | Key | Default | Extra          |
+------------------+--------------+------+-----+---------+----------------+
| id               | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| barcode          | varchar(250) | YES  |     | NULL    |                |
| birth_date       | date         | YES  |     | NULL    |                |
| enrollment_id    | varchar(255) | YES  |     | NULL    |                |
| first_name       | varchar(250) | NO   |     | NULL    |                |
| gender           | varchar(255) | YES  |     | NULL    |                |
| last_name        | varchar(250) | NO   |     | NULL    |                |
| recruitment_type | varchar(255) | YES  |     | NULL    |                |
| site_no          | varchar(255) | NO   |     | NULL    |                |
+------------------+--------------+------+-----+---------+----------------+
9 rows in set (0.00 sec)

mysql> describe question_answer;
+------------------------------+---------------+------+-----+---------+----------------+
| Field                        | Type          | Null | Key | Default | Extra          |
+------------------------------+---------------+------+-----+---------+----------------+
| id                           | bigint(20)    | NO   | PRI | NULL    | auto_increment |
| active                       | bit(1)        | NO   |     | NULL    |                |
| comment                      | varchar(2000) | YES  |     | NULL    |                |
| question_name                | varchar(255)  | NO   | MUL | NULL    |                |
| questionnaire_participant_id | bigint(20)    | NO   | MUL | NULL    |                |
+------------------------------+---------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)


mysql> describe open_answer;
+-----------------------------+---------------+------+-----+---------+----------------+
| Field                       | Type          | Null | Key | Default | Extra          |
+-----------------------------+---------------+------+-----+---------+----------------+
| id                          | bigint(20)    | NO   | PRI | NULL    | auto_increment |
| data_type                   | varchar(255)  | YES  |     | NULL    |                |
| date_value                  | datetime      | YES  |     | NULL    |                |
| decimal_value               | double        | YES  |     | NULL    |                |
| integer_value               | bigint(20)    | YES  |     | NULL    |                |
| open_answer_definition_name | varchar(255)  | NO   | MUL | NULL    |                |
| text_value                  | varchar(2000) | YES  |     | NULL    |                |
| category_answer_id          | bigint(20)    | NO   | MUL | NULL    |                |
+-----------------------------+---------------+------+-----+---------+----------------+
8 rows in set (0.00 sec)


mysql> describe category_answer;
+---------------------------+--------------+------+-----+---------+----------------+
| Field                     | Type         | Null | Key | Default | Extra          |
+---------------------------+--------------+------+-----+---------+----------------+
| id                        | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| active                    | bit(1)       | NO   |     | NULL    |                |
| category_name             | varchar(255) | NO   | MUL | NULL    |                |
| occurence                 | int(11)      | YES  |     | NULL    |                |
| parent_category_answer_id | bigint(20)   | YES  | MUL | NULL    |                |
| question_answer_id        | bigint(20)   | NO   | MUL | NULL    |                |
+---------------------------+--------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)


mysql> describe questionnaire_participant;
+-----------------------+--------------+------+-----+---------+----------------+
| Field                 | Type         | Null | Key | Default | Extra          |
+-----------------------+--------------+------+-----+---------+----------------+
| id                    | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| locale                | varchar(255) | NO   |     | NULL    |                |
| questionnaire_name    | varchar(255) | NO   |     | NULL    |                |
| questionnaire_version | varchar(255) | NO   |     | NULL    |                |
| resume_page           | varchar(255) | YES  |     | NULL    |                |
| time_end              | datetime     | YES  |     | NULL    |                |
| time_start            | datetime     | NO   |     | NULL    |                |
| participant_id        | bigint(20)   | NO   | MUL | NULL    |                |
| user_id               | bigint(20)   | NO   | MUL | NULL    |                |
+-----------------------+--------------+------+-----+---------+----------------+
9 rows in set (0.00 sec)

MySQL snippets

# Lists primary diagnoses with a count (used in the reporting table):

SELECT category_name, count(category_name)

FROM question_answer, category_answer where question_answer.question_name = 'epi_pridiag'

and category_answer.question_answer_id = question_answer.id group by category_name order by count(category_name) desc

# Lists BPt numbers, grouped and counted by primary diagnosis:

SELECT category_name, barcode, count(*)

FROM participant, questionnaire_participant, question_answer, category_answer WHERE participant.id = questionnaire_participant.participant_id AND question_answer.questionnaire_participant_id = questionnaire_participant.id AND question_answer.question_name = 'epi_pridiag' AND category_answer.question_answer_id = question_answer.id GROUP BY category_name, barcode WITH ROLLUP;

# From the command line, to query for gender and birth date for specific diagnosis, into an outfile:

mysql -h briccsdb -u auditor -p briccs -e " SELECT category_name, birth_date, gender FROM participant, questionnaire_participant, question_answer, category_answer WHERE participant.id = questionnaire_participant.participant_id AND question_answer.questionnaire_participant_id = questionnaire_participant.id AND question_answer.question_name = 'epi_pridiag' AND category_answer.question_answer_id = question_answer.id AND ( category_answer.category_name LIKE '%ACS%' OR category_answer.category_name = 'STEMI' OR category_answer.category_name = 'Stable Angina' ) ; " > mysql-output.txt

# Single diagnosis, count:

SELECT count(participant.id) as 'Number of Participants' FROM participant, questionnaire_participant, question_answer, category_answer WHERE participant.id = questionnaire_participant.participant_id AND question_answer.questionnaire_participant_id = questionnaire_participant.id AND ( question_answer.question_name = 'epi_pridiag' OR question_answer.question_name = 'epi_secdiag' ) AND category_answer.question_answer_id = question_answer.id AND category_answer.category_name = 'Aortic_Stenosis';

# Complex query, using two references to the same tables, to derive those of a certain diagnosis (primary or secondary) where the recruitment context is not 'Study specific':

SELECT count(participant.id) as 'AS not study recruited' FROM participant, questionnaire_participant as diag_questionnaire, questionnaire_participant as context_questionnaire, question_answer as diag_question, category_answer as diag_answer, question_answer as context_question, category_answer as context_answer WHERE participant.id = diag_questionnaire.participant_id AND participant.id = context_questionnaire.participant_id AND diag_questionnaire.id = diag_question.questionnaire_participant_id AND ( diag_question.question_name = 'epi_pridiag' OR diag_question.question_name = 'epi_secdiag' ) AND diag_answer.question_answer_id = diag_question.id AND diag_answer.category_name = 'Aortic_Stenosis' AND context_questionnaire.id = context_question.questionnaire_participant_id AND context_question.question_name = 'epi_type' AND context_answer.question_answer_id = context_question.id AND context_answer.category_name <> 'Study_specific' ;

# Using UNION to produce a list of S numbers and DOBs for participants who are either study specific recruits or have a specific primary or secondary diagnosis (UNION combines two SELECTs with duplicates removed):

mysql -h briccsdb -u auditor -p briccs -e " SELECT enrollment_id as 'Participant', birth_date as 'DOB' FROM participant, questionnaire_participant, question_answer, category_answer WHERE participant.id = questionnaire_participant.participant_id AND question_answer.questionnaire_participant_id = questionnaire_participant.id AND ( question_answer.question_name = 'epi_pridiag' OR question_answer.question_name = 'epi_secdiag' ) AND category_answer.question_answer_id = question_answer.id AND category_answer.category_name = 'Aortic_Stenosis' UNION SELECT enrollment_id as 'Participant', birth_date as 'DOB' FROM participant, questionnaire_participant, question_answer, category_answer WHERE participant.id = questionnaire_participant.participant_id AND question_answer.questionnaire_participant_id = questionnaire_participant.id AND question_answer.question_name = 'epi_type' AND category_answer.question_answer_id = question_answer.id AND category_answer.category_name = 'Study_specific' ; " > mysql-output.txt

# Any one of a number of related primary or secondary diagnoses, aged under 65.

SELECT count(distinct participant.id) as 'Number of Participants' FROM participant, questionnaire_participant, question_answer, category_answer WHERE participant.id = questionnaire_participant.participant_id AND birth_date > '1947-03-01' AND question_answer.questionnaire_participant_id = questionnaire_participant.id AND ( question_answer.question_name = 'epi_pridiag' OR question_answer.question_name = 'epi_secdiag' ) AND category_answer.question_answer_id = question_answer.id AND ( category_answer.category_name = 'Troponin_pos_ACS' OR category_answer.category_name = 'Troponin_neg_ACS' OR category_answer.category_name = 'STEMI' OR category_answer.category_name = 'Stable_Angina' OR category_answer.category_name = 'Stable_CAD' OR category_answer.category_name = 'Ischaemic_Cardiomyopathy' OR category_answer.category_name = 'Acute_Stent_Thrombosis' ) ;

# To find incomplete interviews only for Study specific recruits

SELECT participant.barcode, participant.enrollment_id, participant.last_name, interview.start_date FROM participant, interview, questionnaire_participant, question_answer, category_answer WHERE questionnaire_participant.participant_id = participant.id AND question_answer.questionnaire_participant_id = questionnaire_participant.id AND question_answer.question_name = "epi_type" AND category_answer.question_answer_id = question_answer.id AND category_answer.category_name = "Study_specific" AND interview.status = "IN_PROGRESS" AND interview.participant_id = participant.id ORDER BY start_date ;

# To find duplicates enrollment_id entries:

mysql -h briccsdb -u auditor -p briccs -e "SELECT * FROM participant ptone WHERE EXISTS (SELECT 1 FROM participant pttwo WHERE pttwo.enrollment_id = ptone.enrollment_id LIMIT 1, 1 );"

# So to exclude duplicates and NULL BPt numbers:

mysql -h briccsdb -u auditor -p briccs -e "SELECT * FROM participant ptone WHERE NOT EXISTS (SELECT 1 FROM participant pttwo WHERE pttwo.enrollment_id = ptone.enrollment_id LIMIT 1, 1 ) AND barcode is not null order by enrollment_id;"

# To exclude duplicates, nulls and cancelled interviews and list barcodes (BPt numbers) and enrollment dates:

mysql -h briccsdb -u auditor -p briccs -e "SELECT barcode, start_date, status FROM participant ptone, interview WHERE NOT EXISTS (SELECT 1 FROM participant pttwo WHERE pttwo.enrollment_id = ptone.enrollment_id LIMIT 1, 1 ) AND barcode is not null AND status <> 'CANCELLED' AND ptone.id = interview.participant_id order by start_date;" > briccs_enrollment_to_date.txt

# To get a count of the above:

mysql -h briccsdb -u auditor -p briccs -e "SELECT count(*) FROM participant ptone, interview WHERE NOT EXISTS (SELECT 1 FROM participant pttwo WHERE pttwo.enrollment_id = ptone.enrollment_id LIMIT 1, 1 ) AND barcode is not null AND status <> 'CANCELLED' AND ptone.id = interview.participant_id order by start_date;"

AS study monitoring

To extract all relevant (study specific and by diagnosis) recruits with their recruitment dates, for compiling into a spreadsheet, use:

NORTHAMPTON

mysql -h briccsdb -u auditor -p briccs_northampton -e " SELECT barcode as 'Participant', enrollment_id as 'S number', gender as 'Gender', birth_date as 'DOB', start_date as 'Questionnaire Date', participant_attribute_value.text_value as 'Ethnic Origin' FROM participant, questionnaire_participant, interview, question_answer, category_answer, participant_attribute_value WHERE participant.id = questionnaire_participant.participant_id AND participant.id = interview.participant_id AND question_answer.questionnaire_participant_id = questionnaire_participant.id AND ( question_answer.question_name = 'epi_pridiag' OR question_answer.question_name = 'epi_secdiag' ) AND category_answer.question_answer_id = question_answer.id AND category_answer.category_name = 'Aortic_Stenosis' AND participant_attribute_value.participant_id = participant.id AND participant_attribute_value.attribute_name = 'pat_ethnicity' UNION

SELECT barcode as 'Participant', enrollment_id as 'S number', gender as 'Gender', birth_date as 'DOB',start_date as 'Questionnaire Date', participant_attribute_value.text_value as 'Ethnic Origin' FROM participant, interview, questionnaire_participant, question_answer, category_answer, participant_attribute_value WHERE participant.id = questionnaire_participant.participant_id AND participant.id = interview.participant_id AND question_answer.questionnaire_participant_id = questionnaire_participant.id AND question_answer.question_name = 'epi_type' AND category_answer.question_answer_id = question_answer.id AND category_answer.category_name = 'Study_specific' AND participant_attribute_value.participant_id = participant.id AND participant_attribute_value.attribute_name = 'pat_ethnicity'; " > northampton-output.txt

LEICESTER

mysql -h briccsdb -u auditor -p briccs -e " SELECT barcode as 'Participant', enrollment_id as 'S number', gender as 'Gender', birth_date as 'DOB', start_date as 'Questionnaire Date', participant_attribute_value.text_value as 'Ethnic Origin' FROM participant, questionnaire_participant, interview, question_answer, category_answer, participant_attribute_value WHERE participant.id = questionnaire_participant.participant_id AND participant.id = interview.participant_id AND question_answer.questionnaire_participant_id = questionnaire_participant.id AND ( question_answer.question_name = 'epi_pridiag' OR question_answer.question_name = 'epi_secdiag' ) AND category_answer.question_answer_id = question_answer.id AND category_answer.category_name = 'Aortic_Stenosis' AND participant_attribute_value.participant_id = participant.id AND participant_attribute_value.attribute_name = 'pat_ethnicity' UNION

SELECT barcode as 'Participant', enrollment_id as 'S number', gender as 'Gender', birth_date as 'DOB',start_date as 'Questionnaire Date', participant_attribute_value.text_value as 'Ethnic Origin' FROM participant, interview, questionnaire_participant, question_answer, category_answer, participant_attribute_value WHERE participant.id = questionnaire_participant.participant_id AND participant.id = interview.participant_id AND question_answer.questionnaire_participant_id = questionnaire_participant.id AND question_answer.question_name = 'epi_type' AND category_answer.question_answer_id = question_answer.id AND category_answer.category_name = 'Study_specific' AND participant_attribute_value.participant_id = participant.id AND participant_attribute_value.attribute_name = 'pat_ethnicity'; " > leicester-output.txt

Note these searches will export ethnicity but not exclude by it.

Note: See TracWiki for help on using the wiki.