OHDSI Home | Forums | Wiki | Github

THEMIS Question: What do people put in the source_value fields

Friends:

We may or may not want to standardize what goes into the various source_value fields. Every table has one, and currently it is free to be used in any way the ETLers want. For standardization we have the source_concept_id field. However, it only works if people (i) use it, (ii) if the source code is covered by a Standardized Vocabulary, or (iii) if people make their own concepts (the so-called 2-Billionaires).

We have seen several uses of the field:

  • Source code (e.g. Q4095)
  • Source text (e.g. Injection, zoledronic acid (reclast), 1 mg, often the source is not coded, and this is the only way to capture the source information)
  • Source code concatenated to the source text (e.g. Q4095:Injection, zoledronic acid (reclast), 1 mg)
  • Source vocabulary concatenated with source code (e.g. HCPCS:Q4095)

Any use cases of how the source_value is utilized you want to bring up?

2 Likes

By way of full disclosure, we we only use the source value, and we query that directly. That is particularly useful when trying to use a code set that has been published (e.g., a set of ICD9 or ICD10 codes). Given that, our vote is clearly that just the concept id for the source value be used. As for the other permutations listed above, there is no problem for someone to add a column to any table that would store the text or the vocabulary name (or to add both columns).

1 Like

I vote to NOT standardize the source_value fields. I view this field as the ā€œlocalā€ implementers field, we do what we wish. This enables us, the community, to keep the rest of the cdm ā€œpureā€, for the most part.

We use the OMOP supported vocabularies for source_concept_id when available. When there isnā€™t an OMOP supported concept_id for our source_value, we create a concept_id > 2000000000.

We do use the source value for varying combinations of source code and source text. And we reserve the right to use it for other data in the future :slight_smile:

1 Like

I vote to NOT standardize the source_value fields for the same reason as @MPhilofsky.

The data owners in Korean hospitals and national insurance service often complain that they lose some of their data because of standardization of vocabulary. The source_value fields is important place to store their own data in CDM.

I do understand what @Christian_Reich said and I admire his passion for standardization. Ungoverned source_value field might be contrary to the spirit of CDM. But sometimes, people want to conduct their own, or regional collaborative study by using CDM.

I also vote no. Similarly, in the VA we will sometimes use Source_value as the source reported LOINC which we believe (strongly in some cases as we did chart review or NPL with gold standard verification i.e. clinical eyes on the review) to be mismapped but the user is still allowed to review to determined the mapped value in OMOP and for source fidelity. At other times when no concept is available, we have 0 or null and the source value is not null. This is heavily used in labs and drugs, where, as you may guess, with over 2.6 billion rows each, human error occurs with some regularity.

1 Like

I vote no too, but I would like a concept_id field that is something like source_vocabulary_concept_id

Ex. In condition table, the source value may be icd9, icd10, snomed code or whatever else. How do we know?

Adding source_vocabulary_concept_id will improve standardization of *_source_value

1 Like

@gowtham_rao Iā€™m not sure we need a column for source_vocabulary_concept_id. This information is already available in the concept table so it only requires one join.

I also vote to not standardize the source_values. To me, the standardization occurs in the mapping to a standard concept and the source_value is there either for error checking or for a last-resort analysis where the standard concepts are too broad. I see the point of the source_values being a way for the ETL-er to communicate with the end users about how the source were mapped.

True when *_source_concept_id > 0.

How about when *_source_concept_id is =0.? I.e. no omop concept available. How do you know what vocabulary the source value belongs to? As described here

I agree with the rest to not standardize the source_value fields.

However, we might want to standardize the way in which we store non-omop source concepts. As far as I see, there are now two ways: either storing the source concept info (code, vocabulary, description) in the source_to_concept_map or in a new 2-Billionaire concept. I would suggest to use the latter and remove the source_code, source_vocabulary_id and source_description from the source_to_concept_map table and only reference the 2-Billionaire source_concept_id. Or even deprecate the source_to_concept_map table and create the mapping as a new 2-Billionaire record in the concept_relationship table. Any thoughts?

@Gowtham_Rao. Do you store the description of your non-omop source concepts somewhere? e.g. in the source_to_concept_map?

@MaximMoinat, @Gowtham_Rao:

Aaaah. @ericaVoss and I already had some good crossing of swords on this one. She wants to keep the SOURCE_TO_CONCEPT_MAP, I want to put it into its well-deserved retirement with V6. Not really a THEMIS job, letā€™s take it to the CDM Working Group and duke it out.

Iā€™m strongly with @ericaVoss on this one. I think we do need conventions
for how to use the SOURCE_TO_CONCEPT_MAP table, but it has proved to be a
valuable standardized structure for handling the reality that not all
source codes will be in the standard vocabulary. Providing a defined
solution to this problem would be a help for the whole community.

2 Likes

Argh!!! She came tattling and whining to her ā€œbig brotherā€ to get some backup! :smile:

Letā€™s discuss in a new Forum. Iā€™ll open.

I ACTUALLY DIDNā€™T TATTLE THIS TIME! :smile:

agree with @patrick_ryan. But need to know how to do it the right way

i have been thinking about how to handle proprietary codesets, and donā€™t have a good solution. There are proprietary codes that the specific to one organization and anybody outside has no interest or value for these codes. Describing these codes on some OMOP tables is one option, but then it is a duplication of work that is already being done by the source system - creating a data duplication and maintenance nightmare. I dont know the answer

Maybe it is source_to_concept_map ā€“ just need to study it

@Gowtham_Rao as is discussed above, if itā€™s a proprietary vocabulary we put these in the SOURCE_TO_CONCEPT_MAP complete with all available information from the source. That way, even if you canā€™t map it to a standard concept you retain the vocabulary_id, etc. for later use. In our data usually if there is an ICD9 code, for example, that doesnā€™t have a concept_id this is often due to an error in the code and is useless to us anyway. In Truven CCAE, the top code with a source_concept_id of 0 that is not proprietary occurs 893 times and is not mapped because it is probably an ICD9 code though the database told us it should have been an ICD10. It occurs correctly as an ICD9 code over 27mil times.

We have so very many custom source codes. They are throughout every domain of our EHR data. Iā€™m throwing out our process in hopes of getting some feedback. The following refers to mapping our EHR data. I know the claims data is very different.

When we first started our ETL process, we were under the impression the Source to Concept Map table was being deprecated. So, we decided to use the Concept and Concept Relationship table in place of the Source to Concept Map. We create a > 2 billion concept_id for our custom, source codes. We add in all the attributes for every concept. Then we map the > 2 billions to standard concepts in the Concept Relationship table. We use a combination of Usagi, Atlas, and hand mapping.

Two differences in the structure of the tables:

  1. Source to Concept Map has a source description field which is not present in the Concept table. And the Concept table has a concept name field which is not present in the Source to Concept Map table. I find the concept name field is adequate and granular enough that a source description of the code is not needed.

  2. The Concept Relationship table has relationship start and end dates. The Source to Concept Map table does not.

Our ETL takes the source value and matches it to the concept code field of the Concept table. Regardless if it is an OMOP supported concept or a custom (>2Billion) concept, the ETL will find the concept_id associated with the code. If the code is not a standard concept, the ETL looks to the Concept Relationship table for a ā€œMaps toā€ relationship where concept_id_2 is a standard concept. We do not have any source_concept_id = 0 EXCEPT when the source updates their source values before we create a custom, source value.

@ericaVoss: How do you use the Source to Concept Map table? Iā€™m always open to ideas and other solutions to manage the custom source concepts.

@MPhilofsky
Thanks for describing your approach. Demonstrates again that we need to standardize the mapping approach for mapping new, local source codes to target concepts. I think your 2 billion concept approach should be the standard way of dealing with local mappings (although we have been using the source_to_conept_map).

Some thoughts:

  • In my opinion the source_description and source_name are different names for the same thing. We have been using source_description as the source_name of our local source codes. Using both seems redundant indeed.
  • The source_to_concept_map table (stcm) does have valid_start_date and valid_end_date fields. Or do they have a different meaning?
  • And I am curious how you have used Atlas for the concept mapping. Does Athena now offer the same features?

And the way we use the stcm is very similar to your approach, but then both the source code info and ā€˜maps toā€™ relationship is contained in the stcm. For every source vocabulary we insert rows into the stcm, e.g. these procedure codes. No modification of the standard vocabulary tables needed, except for adding the source vocabulary id to the vocabulary table.

Thank you. For the proprietary source code, do you assign 2-billion+ concept id? The vocabulary id may also be a 2-billion+ correct if it is a proprietary vocabulary.

Do you also append these 2-billion+ codes to ohdsi maintained vocabulary files?

This is a common problem shared by many. Claims data is not immune to it. To support payment innovation - lot of codes are introduced.

Same here. The description seems to suggest that here https://github.com/OHDSI/CommonDataModel/blob/master/Documentation/CommonDataModel_Wiki_Files/StandardizedVocabularies/SOURCE_TO_CONCEPT_MAP.md

We need to clean this up if we want to use this table.

Yup, same here.

Very interested in knowing this. Adding local 2-billion concepts to omop vocabulary tables is not easy i.e. concept, concept ancestor, concept relationship.
Source to Concept Map maybe an easier alternative.

  1. what do we do when we canā€™t map to any omop standard code but can map to non standard omop vocabulary codes
  2. What do we do when we canā€™t map to any omop vocabulary codes

@chris_knoll does webapi/circe-be support Source to Concept Map table?

.

This is more of a ā€˜concept setā€™ question, but if you have custom source concepts that you can provide a mapping in concept_relationship table as a ā€˜Maps toā€™ to an existing concept, then you can use the ā€˜mappedā€™ option in a concept set.

However, the case youā€™re talking about: source_to_concept_map table just maps a source code value to a target concept. if youā€™ve created custom concepts in your CDM so that they appear in CONCEPT, then you can just add the custom concept directly to your concept set expression, and it will be used in the query.

Note that custom concepts are not unique across different CDM nodes: there is no protection that someone with a concept 2bil + 1 is the same as another CDMā€™s 2bil+1 concept. So caution when creating studies that leverage custom concepts. The durable approach would be to get the concept stanardized in the omop CDM and then map your custom source values to the standard concepts.

The structure of the table is very simple, for every source code you just list where you think it should map to the standard terminology. Here are our examples that we can share (i.e. the non-proprietary containing lists).

Additionally there are tools to help you map source codes to standard terminology and then ultimately produce a SOURCE_TO_CONCEPT_MAP table (see Usagi).

One of our databases coding system is considered proprietary. We have a method of mapping that is a combination of using information provided by the vendor to link up to other source codes in the OMOP Vocabulary codes and mapping by USAGI to get the proprietary codes mapped to standard terminology.

To use it in an ETL, if you open one of our ETL documents, in the ā€œSource to Standard Terminologyā€ section youā€™ll see we have a standard query. This query either pulls from the Vocabulary or the SOURCE_TO_CONCEPT_MAP table. If I need a map we generated, I filter my query using one of our defined VOCABUALRY_IDs found in the SOURCE_TO_CONCEPT_MAP like this instead:

WHERE SOURCE_VOCABULARY_ID IN ('JNJ_TRU_P_SPCLTY') AND TARGET_STANDARD_CONCEPT IS NOT NULL AND TARGET_INVALID_REASON IS NULL

But if I need a standard map found in the OMOP Vocabulary I would use the same query referenced above and call it with filters like this:

WHERE SOURCE_VOCABULARY_ID IN ('LOINC') AND TARGET_STANDARD_CONCEPT IS NOT NULL
Once I have built all our SOURCE_TO_CONCEPT_MAP files, I just load them into the table. What I like about this is if I do something wrong in the load, I can just truncate this table because the OMOP Vocabulary doesnā€™t use this table for anything. Iā€™m not touching any of the core OMOP Vocabulary tables and canā€™t accidentally screw them up.

No, when you use the SOURCE_TO_CONCEPT_MAP you donā€™t need to worry about giving them CONCEPT_IDs and managing any of that.

t