I think this topic deserves more discussion. In addition to the community sharing best practices for how to populate the observation_period table, it might be good to work toward including a more formal representation in the CDM of the assumptions made about data completeness. Codes mapped to a taxonomy of justifications for confidence in data completeness within observation periods could inform interpretation of analytic results. Codes associated with a low likelihood of completeness could trigger flags that signal the need for cautious interpretation.
For example, EHR data might support relatively high confidence in completeness within brief observation periods during hospital stays or in the ED, low confidence for multiyear periods within primary care clinics. and somewhere in between for months-long periods within specialty care settings.
For claims data, periods of complete insurance loss might be the end of a spectrum of coverage rather than a binary indicator of expected completeness.
If we can encode our knowledge about our data’s risk of incompleteness, analytic routines could inform the interpretation of analytic results in ways that might be especially for users with less intimate knowledge of the source data. A rough taxonomy of expected completeness might be useful and a very precise one might be hard to create.