OHDSI Home | Forums | Wiki | Github

ETL-Synthea produces empty postgres tables

Hello everyone! I’m new to the OHDSI community so some of my terminology may be inaccurate, but I will try my best to describe what I’m trying to accomplish, the steps I’ve taken so far, and the issues I’m encountering.

The general goal is to generate synthetic patient data using Synthea, load that data into an OMOP CDM postgresql database, and then use OMOP2OBO mappings to query for patients with conditions and measurements that map to OBO ontological concepts (e.g., MONDO terms and HPO terms). This is intended to be a proof-of-concept for a larger clinical data processing workflow that I am prototyping.

I began by identifying prerequisites suggested by the ETL-Synthea “Step-by-step Example”

  1. I installed and ran Synthea, making sure to checkout the 2.7.0 version branch and altering the configuration file to output csv files, as indicated in ETL-Synthea. Synthetic data was successfully generated for 5000 patients from Massachusetts. I can see csv tables populated for allergies, careplans, conditions, devices, encounters, imaging_studies, immunizations, medications, observations, organizations, patients, payer_transitions, payers, procedures, providers, and supplies within the synthea/output/csv directory.
  2. I downloaded the Vocabulary release v5.0 02-JUN-22 from Athena. I then extracted the compressed directory and ran the CPT4 utility using the cpt.bat file along with my personal UMLS API key. The files within this “vocabulary” directory include those cpt4 scripts, the readme, a directory for logs, and csv files for CONCEPT, CONCEPT_ANCESTOR, CONCEPT_CLASS, CONCEPT_CPT4, CONCEPT_RELATIONSHIP, CONCEPT_SYNONYM, DOMAIN, DRUG_STRENGTH, RELATIONSHIP, and VOCABULARY. The log file does not contain any warnings or errors after running cpt4.
  3. I created a postgresql database called “synthea10” and within it, created two empty schemas called “cdm_synthea10” and “native.” I also granted connection and usage permissions for user postgres to all schema and tables. I also downloaded the postgresql java JDBC driver, as required by the DatabaseConnector R package.
  4. I installed the ETLSyntheaBuilder library in R (along with its DataBaseConnector dependency) and ran the example script from ETL-Synthea exactly as written except to alter the password, pathToDriver, syntheaFileLoc, and vocabFileLoc variables to match what I have on my machine. NOTE: I had some postgresql configuration issues and R crashes which forced me to drop and recreate the schema in postgres a couple times. But eventually I was able to run all 5 commands from the example without a fatal error preventing them all from completing.

During the final “working” run, I noticed that step 4 in the process (ETLSyntheaBuilder::LoadVocabFromCsv…) returned numerous errors, all of which looked like the following:

ERROR: current transaction is aborted, commands ignored until end of transaction block
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO cdm_synthea10.CONCEPT (concept_id,concept_name,domain_id,vocabulary_id,concept_class_id,standard_concept,concept_code,valid_start_date,valid_end_date,invalid_reason) VALUES(2103412,'Biopsy, soft tissue of back or flank; superficial','Procedure','CPT4','CPT4','S','21920','1970-01-01 -05','2099-12-31 -05',NULL) was aborted: ERROR: current transaction is aborted, commands ignored until end of transaction block  Call getNextException to see other errors in the batch.
        at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:165)
        at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:871)
        at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:910)
        at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1663)
        at org.ohdsi.databaseConnector.BatchedInsert.executeBatch(BatchedInsert.java:114)
Caused by: org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:316)
        at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:868)
        ... 3 more
Caused by: org.postgresql.util.PSQLException: ERROR: null value in column "concept_name" of relation "concept" violates not-null constraint
  Detail: Failing row contains (36311145, null, Meas Value, LOINC, Answer, S, LA12334-1, 1970-01-01, 2099-12-31, null).
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2099)
        at org.postgresql.core.v3.QueryExecutorImpl.flushIfDeadlockRisk(QueryExecutorImpl.java:1456)
        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1481)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:546)
        at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:887)
        ... 3 more

I am able to connect to the database using SQL Workbench, and I can see 44 tables, some of which are populated with data from Synthea. However, as alluded to in the title, some of these contain no records (rows). These include the condition_occurence table, which I was hoping to start with on my proof-of-concept.

I suspect that there was an issue in the vocabulary loading, since that was the step where I saw all of those errors and because this was the step I was least sure about (step 2 above). However, I have no idea how to confirm this or which steps to take next to correct it. I’d greatly appreciate any help on this and will provide any extra details needed. Thanks!

It does sound as if this is an issue with your vocabulary load. If you query your concept table, how many records are loaded. If the vocabulary fails to load properly then the conversion will definitely fail and the empty tables you currently see would be the result.