I would say that we'd want to know the provider because some cases the provider specialty (like dermatologist for skin conditions) is considered when selecting the diagnosis. So to maintain that, you'd have a row with the same diagnosis codes and type, but with different provider. We could say this isn't a dupe in this case, and add provider ID to the 'distinguishing factors'
I always thought as a diagnosis as a 'moment in time' and not something that would have a duration, but the CDM calls for a start and end of the condition, so maybe it's up to the ETL to collapse these things into a continuous duration of the condition, but you run into issue when multiple providers are in the mix (there's only 1 provider per condition occurrence record).
I think you bring up a lot of great points that probably are only a consideration during the design of a specific study/analysis, so perhaps it makes sense to leave the granular details in the tables and let the researcher decide how to 'roll-up' multiple records into a single event.
Part of my interest int his conversation is how CIRCE cohort expressions are built. On one hand, I want to try to enforce some rules so that the cohort criteria has expected results. On the other hand, there isn't a one-size-fits-all solution for all cases. What I'm leaning to, based on your feedback, @jenniferduryea, is that I think I have to make the condition records selected by the criteria support settings to allow a 'group by' based on just the start/end, provider, condition_type, etc, and leave it to the research to decide what constitutes a distinct diagnosis for their research question. The devil is in the details, tho, so I don't have a specific implementation in mind, but this information is very helpful.