I’m currently mapping data for patients who underwent colorectal surgery. Our complications are classified by a widely used system called Clavien Dindo, which I unfortunally can’t find as a concept on Athena. Would any of you have implemented surgical complications in your CDM with the Clavien Dindo classification, and what was your approach for it?
Yes, there is a classification that uses Clavien grades to specify the severity of complications. I don’t know any official terminology that contains it though.
What can be done: you can create custom concepts with 2bil+ concept_ids to represent those grades,e.g. Clavien Grade I etc., and link them to your actual complications through fact_relationship or create separate concepts for each complication (e.g. Pneumonia CG I, Pneumonia CG II etc). The first approach seems to be less messy.
That’s a good question. In general, observational data contain the sheer primary facts. Those classifications or scores are usually derived, or “abstracted”. So, in your case you would create an algorithm for deriving e.g. a Clavien-Dindo Grade III Classification by creating a patient population with a surgery, and a subsequent “surgical, endoscopic or radiological intervention”. Currently, such algorithms would result in membership of a Cohort (records in the COHORT table). Alternatively, you could do what @aostropolets suggests and create your own internal vocabulary, but that will not be interoperable with the rest of the Network.
However, we know that there are data sources which contain the abstracted information. This, and the desire to create standardized OHDSI-wide algorithms we are working on a proposal for an EPISODE table. This is particularly important for cancer research.
Do you have Clavien-Dindo data, or are you working on the algorithm?
Thank for both your input. In the registry we’re currently mapping we unfortunately only have detailed data on their primary surgery, we only have the different complications with a Clavien Dindo grade, so even tough the creation of the algorithmn would be a very elegant solution, I wouldn’t be able to create it form this source alone.
I think your solution Anna is a great way to solve this for now.
@awrosen: Nothing wrong with Anna’s proposal, of course, but just keep in mind: Once we have the Episode table that’s where these things should live, NO MATTER if they are abstracted from the raw data or you are getting them directly.
Anna’s proposal has two disadvantages: First, as I said before, it is not interoperable, and second, you will have a hard time defining the domains. And even when you define the domains they will be different, resulting in your scores in different tables: Conditions (e.g. “multi organ dysfunction”, Observations (e.g. “any deviation from the normal postoperative course”), Drugs (e.g. “antiemetics, antipyretics, analgetics, diuretics and electrolytes”) or Procedures (e.g. “surgical, endoscopic or radiological intervention”). You could make them all Observations and call them “Grade I Clavien-Dindo Score” etc., but then you would miss the fact that these indicate clinical events. Ugly one way or another.
Clavien Grades will be observations, so no need to mess with domains. Then, as proposed, you can link them to the actual events through fact_relationship, so that they will be linked. Should work as a temporary solution. As a permanent one, these grades should be added to a vocabulary (which one?).
NCI vocabulary has these Clavien-Dindo Scores.
NCI is not OMOPed. And it will never be due to messy structure and lack of relationships to other vocabularies, right, @Christian_Reich?
@Christian_Reich, @aostropolets, @Dymshyts - I am working with @awrosen and this question of the Clavien-Dindo score came up. I see Anna’s proposal and think this makes sense. Let me know if your thoughts have changed since.
Create codes for the grades (something like this): 2000000010 = Clavien-Dindo Grade I 2000000020 = Clavien-Dindo Grade II 2000000030 = Clavien-Dindo Grade IIIA 2000000031 = Clavien-Dindo Grade IIIB 2000000040 = Clavien-Dindo Grade IVA 2000000041 = Clavien-Dindo Grade IVB 2000000050 = Clavien-Dindo Grade V
Store the Clavien-Dindo Classification in the OBSERVATION table where VALUE_AS_STRING (since it looks like it can have a character in the score)
There can be a condition linked to the score (e.g. brain hemorrhage). The condition would be placed in the CONDITION_OCCURRENCE table and a FACT_RELATIONSHIP could be generated between the OBSERVATION record and the CONDITION_OCCURRENCE record.
@awrosen - I’d need to understand better how the conditions are derived and how you might use it in analytics. I would say we need to do Step 1 and 2 but still need to be convinced on Step 3.
You don’t need to create custom concept_ids. Use MEASUREMENT.concept_id = 37311607 , Clavien-Dindo complication scale and then in MEASUREMENT.value_as_string for the Clavien_Dindo grade. I don’t see any concept_ids for Grade llla, so you will have to use the value_as_string field.
To add to @aostropolets suggestion, since the concept_id comes from the SNOMED vocabulary, maybe petition them to add the grades?
It will be much better to go for value_as_concept_id, than value_as_string. To do it, please use Clavien-Dindo complication grade.
Then you can use simple values, not specifically grades.
E.g, II, IIIA. LOINC usage should not be bothering since it’s the only vocab that provide the set of these values.
The only thing to be polished here is transferring of ‘Clavien-Dindo complication grade’ concept into the Measurement Domain.
So when we have a score we map to 37311606-Clavien-Dindo complication grade
Instead of: 37311607-Clavien-Dindo classification of surgical complications
Which 37311606 will map us to an OBSERVATION right now. @Alexdavv, will you or would you rather me put in a GitHub request to change the domain?
Then in VALUE_AS_STRING we could put the grade as received but also map the grade to VALUE_AS_CONCEPT_ID (which I can do better with NAACCR over LOINC): 35919088 - I 35919571 - II 35919065 - III 35919029 - IIIA 35919093 - IIIB 35919154 - IVA 35919532 - IVB 36310342-V (except this is LONIC)
This all seems like ontology hacking to me. We are swiping values to shove into value_as_concept_id from syntactically satisfying lists of values. But the semantics of these values are not an actual list of possible values for ‘Clavien-Dindo complication grade’. If the vocabulary is missing concepts, I don’t think it serves us well to just grab what seems kind of right. We should add the missing concepts.
Yeah, after posting I felt like that as well. Then the recommendation would be:
When we have a score we map to 37311606-Clavien-Dindo complication grade
Then in VALUE_AS_STRING we could put the grade as received but also map the grade to VALUE_AS_CONCEPT_ID we add these 2B Concepts 2000000010 = Clavien-Dindo Grade I 2000000020 = Clavien-Dindo Grade II 2000000030 = Clavien-Dindo Grade IIIA 2000000031 = Clavien-Dindo Grade IIIB 2000000040 = Clavien-Dindo Grade IVA 2000000041 = Clavien-Dindo Grade IVB 2000000050 = Clavien-Dindo Grade V
There is supposed to be a condition spin off from these, but I’m not clear on that yet. Need to talk to @awrosen.
Other sources also have CD grade. Wouldn’t it make more sense to either create concepts for grades as well or just use I/II/III etc from LOINC? May not be clean ontology-vise, but better than have disparate 2B concepts representing the same in different sources.
And if using existing LOINC concepts seems shady, then why not create scale-agnostic numbers? Can be used across multiple scales, drinking, smoking, sleep hours, whatever you may think of.
First I’d like to thank @Alexdavv forclearly explaining the difference between [1].
Pre-coordination term - a term that was coordinated and assigned a code before you needed it
(e.g. “Clavien-Dindo complication Grade IIIB”)
Post-coordination - a term that you assembled from other terms at the point when you needed it
(e.g. “Clavien-Dindo complication grade” & “Grade IIIB”)
Based on our discussion we have decided that we will use post-coordination for the Clabien-Dindo Complication Grade:
When we have a score we would write a record to the MEASUREMENT table with MEASUREMENT_CONCEPT_ID as: 37311606-Clavien-Dindo complication grade
Currently this code is of the OBSERVATION domain, however I’ve made a request to the Vocabulary team to move it to MEASUREMENT [2].
The grade can be stored as VALUE_AS_CONCEPT_ID taking @aostropolets recommendations to get concepts for grade created where we don’t have them LOINC: 45879260 = I 45883600 = II 36309815 = IIIA 2000000000 OR REQUEST ONE = IIIB [3] 2000000010 OR REQUEST ONE = IVA [3] 2000000020 OR REQUEST ONE = IVB [3] 36310342 = V
We also had a similar conversation about American Society of Anesthesiologist (ASA) Physical Status Classification System.
When we have a score we would write a record to the MEASUREMENT table with MEASUREMENT_CONCEPT_ID as: 4159411 - American Society of Anesthesiologists physical status classification
Then the score would be stored in VALUE_AS_CONCEPT_ID using the following codes: 45879260 = I 45883600 = II 45883601 = III 45879261 = IV 36310342 = V 2000000000 OR REQUEST ONE = VI [3]
So I am not sure reusing them for grade of a totally different stripe makes good ontological sense. We need to start taking seriously that when we bring in these external standardized vocabularies, just happening to find syntactically equivalent strings is not the same as semantic equivalence. We should introduce new concepts that are in the Meas Value domain. So that we can have an ontology that actually represents the concepts (not just strings) and list of possible values for a Measurement concept that are allowable for a Meas Value entry in MEASUREMENT.value_as_concept_id.
It’s not so straightforward if we’ll look into the LOINC files.
Indeed, when LOINC LA15460-1 ‘IV’ concept becomes a part of LL4442-1 answer list, it has displaytext (the answer string) = ‘IV’ and SubsequentTextPrompt = ‘Melanoma invades reticular dermis’. By LOINC specs, SubsequentTextPrompt is the text associated with answers such as “Other” that indicates what extra information the user should enter, for example, “Please specify:”
When we look into the details of the same LOINC LA15460-1 ‘IV’ concept when it’s a part of LL1685-8 answer list, it has displaytext = ‘IV’ and SubsequentTextPrompt = ‘NULL’. The only concept that uses this answer list is 67213-9 Stage only [PhenX] and it’s not in the area of tumor invasion.
This is how LOINC question/answer and most of the post-coordinated stuff build: the only thing that determines the meaning of the concept is its description and vertical hierarchical relationships. Once it matches the context, we can use it. LOINC doesn’t provide such relationships for answers (making just syntactical string equivalents of them), but SNOMED does. E.g. SNOMED 4125539 ‘IV’ concept is a Roman numeral so definitely might not be used for intravenous immunoglobulin.
Opposite stuff is SNOMED pre-coordinated concepts, e.g. 40481923 ‘pT1b category’ that implies the result of tumor pathology finding measured using TNM. Such concepts are not used as values, they are sufficient by themselves.
This is definitely what should be introduced, but we’re just at the beginning of a long walk. You already see how LOINC is good in this. A lot of other issues are still to be addressed:
selection of Standard between LOINC/SNOMED/NAACCR;
replacement of LOINC concepts in answer lists by another Standard or not;
value completeness: a selection between ‘1b’ or ‘T1b’ or ‘pT1b’ or ‘pT1b stage’ or ‘pT1b tumor stage’ and accordance with Measurements of different detail degree;
splitting of pre-coordinated concepts.
Well, the only vocab that provides all the options is NAACCR. But it has duplicates, still in the workshop and we all understand the context that goes with it. Some of the SNOMED 4152511 Roman numeral are good, but we can’t use 4152513 Upper case Roman letter here.
I really think that the above-mentioned LOINC’s and SNOMED’s mixture is a good choice at this point, but we still have nothing but NAACCR for IIIB, IVA and IVB. Let’s hear from @Christian_Reich@Dymshyts@mik@aostropolets and @mgurley what exactly to be created.
You didn’t tag me, but let me still add 2 cents here.
I would strongly strongly suggest to pre-coordinate. Because it works better for analytical use cases and Atlas, because it says all it has to say in a single concepts, and because post-coordination is a mess:
Post-coordinated facts are harder to query, Atlas needs to know about the coordination.
You either pre-define what concepts can be coordinated (through “has answer” relationships), but then you may as well pre-coordinate.
You don’t pre-define, but then you will get a lot of garbage (Clavien-Dindo complication grades “3” or “VII” or “100.3” or “high”).
Also, you always run into the problem to decide wether or not the value_as_concept_id or answer has full semantic identity or not. In other words, is the “IIIB” only a Clavien-Dindo complication grade, or also a TNM Pathology Stage Group? Are all these “IIIB” one concept, even though they mean different things? Or are there many, in which case we have a ton of concepts “IIIB”, and we wouldn’t know the difference unless we follow the concept_relationship? Of course we could characterize the IIIB as a Clavien-Dindo IIIB or Stage Group IIIB, but that would be - botched pre-coordinatation!
Bottom line: Don’t do that. There are only two reasons why post-coordination is better, which really is only one reason:
There are too many answers to create all these pre-coordinations. Bad reason, because you got to have all the answers anyway.
You need to incorporate an infinite or unknowable amount of answers, like in truly numerical values (not the categorical ones I, II, III etc.). In that case you have no choice.
Atlas supports this in cohort building. For the rest, custom covariates is a solution for now. But methods should follow the needs. And they will - we cannot pre-coordinate everything anyway.
In could look into both concept_relationship (in cohort definitions to support it) and into data (in characterization to generate the covariates from frequent post-coordinated combinations).
Agree, but there are always users that desire to have ‘II or III’, ‘between II and III’, and many other things that we even can’t imagine, but supposed to be useful by them.
And unless we don’t put new pre-coordinated OMOP Extension concepts and their SNOMED 37311607 Clavien-Dindo complication scale ancestor into the Condition Domain (blocking up value_as_concept_id) and don’t deStandardize another SNOMED agonist 37311606 Clavien-Dindo complication grade, people would use both designs for one clinical entity what is even more mess (just remember the COVID/Influenza testing cohorts). Can we do such forced standardization and leave just pre-coordinated Conditions? Or we have to deal with pre-/post-coordinated mix among the same terms forever?
We’re playing around with Domains, but, by the definition, it should be a Measurement (please don’t say to pre-coordinate there), maybe - Observation (will still enforce the mess), but not a Condition.
… while post-coordinated mess might be resolved by proper phenotyping and validation.
We do. But making 10-20% pre-coordinated will not solve the whole issue.
When we resolve one specific problem, yes. Once it comes to the general approach for the model, we need to multiply answers by questions and it becomes a good reason.
Numerical go to the value_as_number so there is no issue.
But what is the borderline and who will decide and when? A domain could be.
We still need a good solution for:
lab tests;
allergy to substance;
history/family history of;
disease suspected;
clinical finding absent/disorder excluded;
and others where people forced to go for pre-/post-coordination at once.
This would definitely work too.
@ericaVoss@awrosen@sbicty Sorry for long discussions, but these gonna be first OMOP Extension concepts after the COIVD-related ones.