I am currently trying to map some CPRD data with the tool provided on the GitHub.
Unfortunately, I get an error during the Chunks creation. The tool need a table called HES_fielddefinitions.
After looking in the documentation of the CPRD ETL (CPRD_ETL_CDM_V5.doc), it appears that the following tables are needed in the raw database: HES_fielddefinitions and HES_fieldmappings. According to the documentation, these tables were created from HES data lookups that were originally provided by HES in a PDF.
I searched in the HES documentation but I wasn’t able to find the data which could allow me to create these tables.
Could you help me by pointing which document and were to look at so I can create them?
Just wondering if you made any progress with this. I am also attempting to convert CPRD data to the CDM using the ETL-CDMBuilder. It appears to hang on the Chunks creation part of the process, but fails to provide me with any error messages.
you need to drop all files like HES_… in {Builder folder}/CPRDV5/Definitions/
in {Builder folder}/CPRDV5/Definitions/L_PROVIDER.xml file
replace <Query> ... </Query> to
<Query>
select
staffid as PROVIDER_ID,
cast(Right(cast(staffid as varchar), 3) as int) as CARE_SITE_ID,
cast(staffid as varchar) as PROVIDER_SOURCE_VALUE,
cast(ISNULL(role, 0) as varchar) as SPECIALTY_SOURCE_VALUE,
cast(ISNULL(role, 0) as varchar) as SPECIALTY_SOURCE_KEY,
ISNULL(gender, 0) as gender
from {sc}.Staff
</Query>
Reset chunk creation step, right click on “Chunk creation” step -> Reset step
The try again with following parameters Number of batches=0, Batch size=1000 and MaxDegreeOfParallelism=1
If it will stuck again, try to check SQL Server - Activity Monitor, maybe issue with you data, no indexes or etc.
I tried what you suggested but to no avail. Unfortunately the Activity Monitor failed to provide me with any pertinent information. You mentioned that there maybe an issue with the source data, is there any way to validate the source data, there are no Indexes only the odd PK.
The only thing that I might not have done correctly is the creation of the OMOP vocab data base, does this even get utilised at the chunks creation part?
About data validation, you can find all queries in cdm builder folders.
For example for CPRD:
for chunk creation step - {Builder Folder}/CPRDV5/Batch.sql
conversion to CDM step - {Builder Folder}/CPRDV5/Definitions/…
And I would recommend at least create CLUSTERED INDEX on patid column (person_id in CPRD) in all tables that have patid column, it should significantly improve performance.
I did previously downloaded the required OMOP vocab databases from (http://athena.ohdsi.org). My question was more related to working with the CPRD files located in man/VOCAB_ADDITIONS. Would I be correct in saying that the contents of these files need to imported into the corresponding tables in the OMOP Vocabulary. i.e.
I don’t seem to have corresponding tables for the the remaining CPRD files located in the VOCAB_ADDITIONS.
I will take a closer look at the Batch.SQL and Definitions to determine where I could be going wrong. I appreciate all you help with this and recognise that this was originally built as a in-house application. I myself come from a IT background and are helping to support some researchers in getting this up and running so therefore have to plead a certain degree of ignorance.
Yep, you right, files contains codes that were not exists in OMOP vacab, they were added by JnJ team, you need to import them to the OMOP Vocabulary.
+ following files from man/VOCAB_ADDITIONS to dbo.source_to_concept_map:
JNJ_CPRD_ET_LOINC.txt
JNJ_CPRD_GS_RXN.txt
JNJ_CPRD_HES_LOINC.txt
JNJ_CPRD_PROV_CMS2.txt
JNJ_CPRD_PROV_SPEC.txt
JNJ_CPRD_SCORE_LOINC.txt
JNJ_CPRD_T_ET_LOINC.txt