I like that idea, and I agree that the type concept ids are intended to attribute the type of source. It might be really nice to work out the hierarchy and terminology for this vocabulary.
Just to expand a little bit, I find the type concepts confusing. But they are important because they let us choose the kind of information we return on a query.
For example “inpatient” and “outpatient” are also indicated in the visit table, so why are they repeated in the condition type concept id? One reason is that they are pointing to the file from which the data came. The other is that conditions don’t require a visit, so it is helpful to be able to identify the type of condition in the absence of visit information. But it leads to the question – if I want a diagnosis from an outpatient visit, how do I find it? Is it in the condition type or the visit type?
I don’t think we have types for “physician” and “facility”, which would be a meaningful type in the context of conditions in claims data. But they might be useless for EHR data. Since each system organizes their data differently, we might need a lot of types. In that sense, if we go down this path, the type concept ids become more like pointers to the original data.
The number 10 (“diagnosis 10”) is not strictly a type – it is just a pointer to the location in the original record. However, “primary diagnosis” or “diagnosis 1” could be a type since it indicates that a condition is somehow the most important. There is some thought that diagnosis order might be important, but I don’t know of any use cases for them in claims data. If order is important, they they might be considered a type.
The notions of “line” and “detail” are a type and reflect a different kind of information. Line means “associated with procedures on the same line within a claim”, and claim means “associated with a set of procedures and visits on a claim”.
So, to my naive view, to some extent we are mixing what I would consider to be “types” with information about the original data structure. Maybe it is all my lack of understanding, but I believe if we worked out a vocabulary to handle this, perhaps with guidelines and best practices, it might help organizations with their ETL.