CPRD mapping using the ETL-CDMBuilder

erwanlc · August 6, 2018, 7:32am

Hello everyone,

I am currently trying to map some CPRD data with the tool provided on the GitHub.
Unfortunately, I get an error during the Chunks creation. The tool need a table called HES_fielddefinitions.
After looking in the documentation of the CPRD ETL (CPRD_ETL_CDM_V5.doc), it appears that the following tables are needed in the raw database: HES_fielddefinitions and HES_fieldmappings. According to the documentation, these tables were created from HES data lookups that were originally provided by HES in a PDF.

I searched in the HES documentation but I wasn’t able to find the data which could allow me to create these tables.
Could you help me by pointing which document and were to look at so I can create them?

Thanks a lot for your help !

ceig · January 22, 2019, 2:12pm

Hi,

Just wondering if you made any progress with this. I am also attempting to convert CPRD data to the CDM using the ETL-CDMBuilder. It appears to hang on the Chunks creation part of the process, but fails to provide me with any error messages.

Any advice would be greatly appreciated.

Regards

bradanton · January 22, 2019, 2:26pm

Hi,

Did you uses MS SQL? And I need details about Number of batches, Batch size and MaxDegreeOfParallelism.

Thanks,
Anton

ceig · January 23, 2019, 9:39am

Hi Anton,

Thanks for the reply,

Yes it is in a MS SQL environment. I have tried various combinations of the three. Everything from the default to 1,1,1 respectively.

Please note there are not HES_* tables in the CPRD RAW database that I am using. The CPRD data was originally converted from flat flies into a DB according to the following spec (https://cprd.com/sites/default/files/CPRD%20GOLD%20Full%20Data%20Specification%20v2.0.pdf).

Thanks in advance for looking into this.

Regards

Chris

bradanton · January 23, 2019, 11:07am

Hi Chris,

you need to drop all files like HES_… in {Builder folder}/CPRDV5/Definitions/

in {Builder folder}/CPRDV5/Definitions/L_PROVIDER.xml file
replace <Query> ... </Query> to

 <Query>
     select
     staffid as PROVIDER_ID,
     cast(Right(cast(staffid as varchar), 3) as int) as CARE_SITE_ID,
     cast(staffid as varchar) as PROVIDER_SOURCE_VALUE,
     cast(ISNULL(role, 0) as varchar) as SPECIALTY_SOURCE_VALUE,
     cast(ISNULL(role, 0) as varchar) as SPECIALTY_SOURCE_KEY,
     ISNULL(gender, 0) as gender
     from {sc}.Staff
   </Query>

Reset chunk creation step, right click on “Chunk creation” step -> Reset step
The try again with following parameters Number of batches=0, Batch size=1000 and MaxDegreeOfParallelism=1
If it will stuck again, try to check SQL Server - Activity Monitor, maybe issue with you data, no indexes or etc.

Thanks,
Anton

ceig · January 28, 2019, 4:38pm

Hi Anton,

Thanks for the reply and spending some on this.

I tried what you suggested but to no avail. Unfortunately the Activity Monitor failed to provide me with any pertinent information. You mentioned that there maybe an issue with the source data, is there any way to validate the source data, there are no Indexes only the odd PK.

The only thing that I might not have done correctly is the creation of the OMOP vocab data base, does this even get utilised at the chunks creation part?

Regards

Chris

bradanton · January 29, 2019, 9:46am

Hi Chris,

The CDM Builder do not create vocab databases. Try this link https://github.com/OHDSI/Vocab-download
maybe it will help you.

About data validation, you can find all queries in cdm builder folders.
For example for CPRD:
for chunk creation step - {Builder Folder}/CPRDV5/Batch.sql
conversion to CDM step - {Builder Folder}/CPRDV5/Definitions/…

And I would recommend at least create CLUSTERED INDEX on patid column (person_id in CPRD) in all tables that have patid column, it should significantly improve performance.

Thanks,
Anton

ceig · January 29, 2019, 10:28am

Hi Anton,

Thanks again

I did previously downloaded the required OMOP vocab databases from (http://athena.ohdsi.org). My question was more related to working with the CPRD files located in man/VOCAB_ADDITIONS. Would I be correct in saying that the contents of these files need to imported into the corresponding tables in the OMOP Vocabulary. i.e.

JNJ_VOCABULARY.txt >> dbo.vocabulary
JNJ_CPRD_CONCEPT_ANCESTOR.txt >> dbo.concept_ancestor
JNJ_CPRD_CONCEPT_RELATIONSHIP.txt >> dbo.concept_relationship

I don’t seem to have corresponding tables for the the remaining CPRD files located in the VOCAB_ADDITIONS.

I will take a closer look at the Batch.SQL and Definitions to determine where I could be going wrong. I appreciate all you help with this and recognise that this was originally built as a in-house application. I myself come from a IT background and are helping to support some researchers in getting this up and running so therefore have to plead a certain degree of ignorance.

Thanks

Chris

bradanton · January 29, 2019, 10:57am

Yep, you right, files contains codes that were not exists in OMOP vacab, they were added by JnJ team, you need to import them to the OMOP Vocabulary.

+ following files from man/VOCAB_ADDITIONS to dbo.source_to_concept_map:
JNJ_CPRD_ET_LOINC.txt
JNJ_CPRD_GS_RXN.txt
JNJ_CPRD_HES_LOINC.txt
JNJ_CPRD_PROV_CMS2.txt
JNJ_CPRD_PROV_SPEC.txt
JNJ_CPRD_SCORE_LOINC.txt
JNJ_CPRD_T_ET_LOINC.txt