We are curious where the community is on thoughts about this problem where there is no “enrollment” information. We are using a large, unadjudicated claims database from a large clearinghouse. As such, we have no enrollment data. The observation period table is populated by taking the first and last claim date. For calculating rates the denominator is too large and the rates are too small (compared with other data sources). We ran into this when we calculated historic background rates for Covid vaccine side effects and got unusually low rates compared to other large databases in the US. Since we are fairly sure we are capturing events accurately, we think that the culprit must be in the denominator.
The default for populating the observation period table is to use the first encounter as Start Date and the last encounter as End Date for the Observation Period. But in-between could be years of no activity, and we are making the assumption the patient is just healthy. Do people have a better approach?
We were thinking about the following: What if we create the observation table in the same way as the drug_era/condition_era tables in the OMOP ETL. This means, we would string together claims from all sources (condition, procedure, drug, etc) that are close together (we would like help defining what is close enough, but it may be an empirical question), but if there is too much gap we let the Observation Period tear off. In building observation period eras we would treat them identically and create eras where the person was using services. We would need to specify the persistence window when a new era would start, but beyond this the ETL code for condition and drug era already exists to do this. In essence, instead of having one long observation period you would have a few, interrupted by some gap of a maximal duration (half year perhaps?).
Has anyone solved this problem? What are your thoughts? Thanks in advance.