Condition/Procedure types for clinicians

Hi there.

I am converting a hospital system into OMOP and have a quick question about the conditions, which I assume will come up again when I am inserting the procedures.

The files I received have a unique MRN, FIN combination, MRN is unique by patient, FIN is unique by visit, and then up to 100 columns that state diagnosis, the maximum a person has is 51, but the average is 14.

I am trying to figure out where to put each of these diagnosis according to Condition type, which I see a lot of them, just to mention I see Impatient Header and Detail, Outpatient Header and Detail, Carrier Claim Header and Detail, and some others.

Could anybody point me into what do these mean, so I can know which correspond to what?

Thanks in advance.

1 Like

@Emanuel_Villa:

51 diagnoses. Not bad. :smile:

Just list them all out one record each. For Condition Type, don’t use any of the header or detail ones, they are all referring to claims. If they are recorded by the care provider, probably 38000245 “EHR problem list entry” is the best.

Thank you so much for the answer, really appreciate it.

EV

But we should revisit these Condition Types and document their definitions. I am having trouble myself for all those that are not classic claims types:

38000245 EHR problem list entry
42894222 EHR Chief Complaint
42898140 Referral record - @rimma - are you using these?
43542353 Observation recorded from EHR - @rimma - why did we call it “observation” if it is a condition, and how is it different from the problem list?
44786627 Primary Condition - @ericaVoss, @DTorok - you cooked these up. Are you using them?
44786628 First Position Condition - @ericaVoss, @DTorok - what’s the difference to the previous one?
44786629 Secondary Condition
45754805 EHR Episode Entry - @Patrick_Ryan that was yours, was it not? Any good in daily life?

44786627 - Primary Condition
44786628 - First Position Condition

I thought we requested these for SYNPUF ETL but I cannot find them in the documentation and I can’t even find the email/forum post where I asked you for them! @aguynamedryan, @Mark_Danese, or @lee_evans did we use these at all? Otherwise I cannot remember when/why I asked for them. :frowning:

Some claims datasets have PDX (primary diagnosis) in addition to DX1, first position diagnosis.

Christian,

All the condition types that mention position and detail or header where
based on the claim form and/or source data that we had access to. However,
we no longer use any of them and now use Primary and Secondary, which I
believe are relevant. Primary and First Position Condition are probably
synonymous, but we opted for creating Primary and Secondary to make a clean
break from condition types that referenced position.

Don

First Position Condition is different from Primary Condition. I believe there are databases where you get N diagnosis code fields plus a “primary diagnosis” which is often, but not always, the same as the first diagnosis field.

@DTorok:

Define “we”.

So, all these 57 position diagnosis are now lumped into “second”? Who is doing that?

@Mark_Danese:

And are you using these positions for anything?

We are using the positions as well as the notions of “line” and “header” in the SynPUF ETL. We are using line for claims where the procedure is directly linked with diagnosis codes (on the same “line”). In Medicare, this is a 1:1, but other datasets allow for 1 procedure code to be directly linked to multiple diagnosis codes. There are no line claims in the SynPUF data, but there are in regular Medicare and in the Medicare data linked with SEER data.

We are using header for claims where there are multiple services that are linked to a pool of diagnosis codes. One can think of this as an “invoice” for services rendered where there is no direct correlation between the service and the diagnosis. Inpatient hospital data is often like this.

In both cases, we are using the positions as well. Having said this, I know there is utility in keeping the provenance for the first diagnosis code and possibly the primary diagnosis code (“admitting diagnosis”). There is also utility for linking conditions and procedures on claims where there is line level information.

Chris,

Here is our utilization of the concepts from your inventory:

38000245 EHR problem list entry We will definitely be using in for PCORnet v2 to differentiate billing diagnoses from the ones coming from problem list
42898140 Referral record - @rimma - are you using these? No
43542353 Observation recorded from EHR - @rimma - why did we call it “observation” if it is a condition, and how is it different from the problem list? Not guilty. We are not using this concept.
44786627 Primary Condition - @ericaVoss, @DTorok - you cooked these up. Are you using them? We are using this to differentiate primary vs. secondary billing diagnoses.
44786629 Secondary Condition We are using this to differentiate primary vs. secondary billing diagnoses.

Since these concepts purpose is to identify condition source, I’d like you to consider adding the source name to those concepts where the source is not explicitly stated. For example, change “Primary Condition” to “Billing Primary Condition” or “Claims Primary Condition”.

Thank you.

Just to follow on Rimma’s comments, is it worth considering a “provenance” table in the CDM? A place where we can store details about where the data came from? There seem to be a lot of codes that people want to have, but don’t get used in analysis very often. Wanting to have them is usually a proxy for provenance so that things can be checked and validated. Personally, in the example of Medicare data, I would like to know the file type (inpatient, outpatient, physician, etc.), year, unique record identifier, and maybe type of record (line vs claim). In terms of reproducibility, this would be a nice thing. It also allows for debugging when you get data that was ETL’d by someone else. We have discovered that there is a lot of variability in how ETL is done (no surprise, I am sure).

This could be a slippery slope, I realize. But maybe it is better to put this stuff in an acceptable place rather than providing a never-ending list of concept ids to cover all possible provenance information codes.

@Mark_Danese:

Interesting. But I am not quite getting what you mean by “use”. You somehow establish a connection between procedures and diagnoses by line, correct? What for? To establish the indication for the procedure? If so, how reliable is that?

I suggest the addition of “Admitting diagnosis” and “Discharge diagnosis”

For the line diagnoses, they can be used to attribute costs, or to distinguish the indication for a drug. For example, rituximab can be used for rheumatoid arthritis, ITP, or lymphoma and the line detail can be used for this. Some insurers require specific diagnosis codes for specific drugs (and procedures). @jenniferduryea is out until the end of September or I would ask her to provide some more details. To answer your question, I don’t know how reliable it is but I can imagine people with specific utilization questions wanting to know about associated diagnoses.

We used it in the past when we were trying to attribute costs to chemotherapy, other cancer care, and all other care. I don’t think that was our best algorithm, but it was useful to be able to simply require that there was a J code for chemotherapy and a diagnosis code for cancer to get “chemotherapy costs”.

I believe, OMOP conventional use for concept_type_id fields is to attribute type of source, rather than a particular source which would indicate provenance.

To address the need of being able to generalize (inpatient condition, outpatient condition, etc.) or specialize (Inpatient detail - 10th position) the type, we need to build a hierarchy of the source type terms: Inpatient -> Inpatient Condition -> Inpatient detail - 10th position. Where the top concept in the hierarchy would indicate a generic Inpatient Source. This hierarchy will give the ability to align multiple representation by normalizing them to the topmost available concept.

Yes, we need Admitting/Discharge/Preliminary/Final. However, I believe we need a designated field for this attribute in Condition_Occurrence table because it indicate diagnosis staging rather than diagnosis source.

We have agreed to store this attribute in the Observation table and linking it to Condition_Occurrence via Fact_Relationship. This is convoluted and creates a lot of overhead on ETL. So if there is a recognized need for this attribute, making it first class would be very beneficial.

I like that idea, and I agree that the type concept ids are intended to attribute the type of source. It might be really nice to work out the hierarchy and terminology for this vocabulary.

Just to expand a little bit, I find the type concepts confusing. But they are important because they let us choose the kind of information we return on a query.

For example “inpatient” and “outpatient” are also indicated in the visit table, so why are they repeated in the condition type concept id? One reason is that they are pointing to the file from which the data came. The other is that conditions don’t require a visit, so it is helpful to be able to identify the type of condition in the absence of visit information. But it leads to the question – if I want a diagnosis from an outpatient visit, how do I find it? Is it in the condition type or the visit type?

I don’t think we have types for “physician” and “facility”, which would be a meaningful type in the context of conditions in claims data. But they might be useless for EHR data. Since each system organizes their data differently, we might need a lot of types. In that sense, if we go down this path, the type concept ids become more like pointers to the original data.

The number 10 (“diagnosis 10”) is not strictly a type – it is just a pointer to the location in the original record. However, “primary diagnosis” or “diagnosis 1” could be a type since it indicates that a condition is somehow the most important. There is some thought that diagnosis order might be important, but I don’t know of any use cases for them in claims data. If order is important, they they might be considered a type.

The notions of “line” and “detail” are a type and reflect a different kind of information. Line means “associated with procedures on the same line within a claim”, and claim means “associated with a set of procedures and visits on a claim”.

So, to my naive view, to some extent we are mixing what I would consider to be “types” with information about the original data structure. Maybe it is all my lack of understanding, but I believe if we worked out a vocabulary to handle this, perhaps with guidelines and best practices, it might help organizations with their ETL.

Friends:

You know the drill for proposing changes to the CDM: We need to list the use cases. To abuse the Type Concepts is definitely very suboptimal. We should model them out. So:

@DTorok: What is the scientific question that the distinction between “Admitting diagnosis” and “Discharge diagnosis” would answer?
@Mark_Danese: What are the “specific utilization questions wanting to know about associated diagnoses”?
@rimma: I like your idea of the hierarchy, but why use case would “inpattient detail - 10th position” actually help with? And do we ever know 10th position, or is “secondary” would do the trick?
@Mark_Danese: I understand your cost of indication idea. But using the position (or line) of diagnosis is only the way you get that information. After ETL, this shoudl be irrelevant. You want the condition which is the indication for that treatment (drug, procedure etc.).

In addition to costing, utilization questions include off-label use, or types of “on label” use. For example, what proportion of use of rituximab is for ITP, RA, and lymphoma? What proportion of angioplasty is for unstable angina vs. MI?

I am not sure how to code the notion of an “indication” (visit/drug/procedure). Is that the fact relationship table? I didn’t see anything about conditions/indication in the visit, procedure or drug exposure tables (though it sounds familiar – was that in v4?).