Confused about ETL and mappings

mlperkin · June 21, 2023, 1:55pm

Hello all,

I am in the process of developing a web app that attempts to convert our Redcap survey data into the OMOP CDM. Currently, we are at the point where we are able to map our survey questions and answers to standard SNOMED ids. It’s actually similar to USAGI in that is searches for high similarities and then there is an approval process.

However, I’m a bit stumped on what to do next in the process. I suppose I would be at the loading part of the ETL process. I am looking over and seeing mentions of the ‘concept_relation’ table and that ‘source_to_concept_map’ is legacy now. Is ‘concept_relation’ where I would insert and define our mappings?

Or is it that I would be entirely responsible for writing out all the pieces to populate all of the OMOP tables (person, observation, etc…) using the pieces of data I am able to obtain from the mappings?

For example, our Redcap survey question “What is your birth date?” might map to the SNOMED “Date of birth” term here: Athena. I have all these associations and links, but what is the best way to load our data now into the OMOP CDM?

Thanks all!

mkwong · June 21, 2023, 2:44pm

I have a project where users use an online splenic tumor risk calculator and completes a outcomes survey 30 days later. The questions and answers are also mapped to OMOP SNOMED/LOINC concepts. When users enter a case - the backend is responsible for generating a person, visit_occurrence record using the information provided - age, gender, date/time, etc that go into these tables. The assessment questions and answers then go into observations and measurements using the person.person_id and visit_occurrence.visit_occurrence_id previously created. Similarly, the outcomes questions/answers are then tied to these person and visit_occurrence records. Hope that helps.

Christian_Reich · June 21, 2023, 5:19pm

@mlperkin:

The problem you have is that you have some redcap question-answer pairs, and you have to convert that into a list of facts. And these facts have to live in the right place in the right way. Essentially, you have to do a mini-OMOP ETL on the fly, there. Your example shows the problem well: You have the question for the birthday. Obviously, you cannot use that “Date of birth” Observation concept and write it into the OBSERVATION table. Instead, you have to split it into day_of_birth, month_of_birth and year_of_birth and write those into the PERSON table.

I don’t think there is a generic way of doing it. You will have to process each question and their possible answers using their own logic.

Sorry. At some point, GPT will be smart enough. Till then, you have some job security.

mlperkin · June 21, 2023, 5:53pm

Thank you for the insight. This is where my lack of knowledge really shows so please forgive me. It appears that I will need an ETL process to recreate our data into the OMOP CDM. This seems like it would require a thorough understanding of all the OMOP tables and columns which I do not have, yet.

I’m mostly familiar with only our questions and answers themselves. I haven’t actually seen the response type data which is where I suppose my ETL will need that data also to fully create the OMOP CDM.

To quote the OHDSI book, this is quite a large undertaking then. Having to parse all the varied responses could be quite the challenge, but I believe I could accomplish it to some degree. One thing I’m confused about is why is SNOMED term “Date of birth” showing “Observation” in Athena? I was under the impression that Athena was sort of a guide to link standardized vocabulary to the OMOP CDM. I’m not sure how I could use these mappings now to create the OMOP CDM if the Athena metadata doesn’t point to me the correct CDM table.

Mark · June 21, 2023, 6:01pm

It does, you just have to pick the correct DOMAIN for the table the data goes into. Placing the date of birth into the PERSON table does not need a DOMAIN as it does not need a CONCEPT ID (as it is top level data, think of concept_id’s as a pointer as you appear to be a programmer).
Look at this:OMOP CDM v5.3

MPhilofsky · June 21, 2023, 8:06pm

Hello @mlperkin and welcome to OHDSI!

There are two types of “mappings” one must do in order to properly populate an OMOP CDM.

First is the field to field mapping. You take a source field (example: date of birth) and map it to the correct field in the OMOP CDM (this example maps to 3 fields in the PERSON table: day_of_birth, month_of_birth, and year_of_birth).

Second mapping uses Athena and/or Usagi to semantically harmonize codes from different vocabularies into a standard we use in research. And as @Mark says, the domain_id of the standard concept tells you where the data live in the CDM.

You should read the Book of OHDSI chapters 4, 5 and 6. The book explains the process in more detail. And, yes, it is quite a bit of work to ETL your data. But once it’s in the CDM correctly, the fun begins

mlperkin · June 22, 2023, 2:16pm

Thanks for the information. I have been skimming through the Book of OHDSI, but I think I’ll need to actually read it thoroughly and get a firm grasp.

I can understand the use for two mappings. I think my initial plan was to do it in the reverse of how you listed. I was going to assist in mapping our questions first to Athena and then hopefully using the combined data to then do the field to field mappings.

I’m stuck in trying to develop the logic and overall flow of it, though.

Mark · June 22, 2023, 2:40pm

Here, in Book of ODHSI, shows the order of dependence.

As far as to the logic of the ETL, you are either going to have to store data in objects and/or temp table, as you update the data one datum at a time(according to what language/db you are using).
Start by building a Person ‘object’ and filling in the data, one point at a time, before you attempt to convert it to the CDM.