OHDSI Home | Forums | Wiki | Github

Why does the number of records in the data source decrease?

Hello.

While watching some Ehden tutorials about Atlas I found something strange.

The source data density report shows that the number of records, in average, increases over time, but if we take a closer look we can also see small drops.

Why does the total number of records decrease so often?
If it only happened once, it could be an error but it happens hundreds of times.

Maybe it’s not the (cumulative) total number of records but new records?
Maybe the data is plenty of errors and is corrected every day or has duplicated?

You mean from quarter to quarter (or is it 2-month periods)? Actually, there are patterns. The beginning and end of any one year has more density than the time in between: Because it is winter and people are sicker. But I am sure there are capture artifacts and random noise.

Why does it bother you?

Yes, you are calling it artifacts and noise.
What cause them (the descending part)?
Are we losing data? Redundant data? Aggregation? Missing data?

We are supposed to be adding new data incrementally, not deleting data, isn’t it?

Artifacts: Timing of data feeds and aggregations. Weather. Different lengths of the months. People deciding to go to see the doctor or not. Just life.

2 Likes

But isn’t that plot showing the cumulative total number of records until a given day?

Shouldn’t it look like this…? (without any drawdown)
Iyxub

Hello @JuanPi,

The number of records is not cumulative over time. It shows the number of records (clinical events) for each particular domain that happened in a particular month. If the line drops, it means there are less clinical events in month N compared to month N-1.

Reference (tutorial video): ATLAS Tutorial: Data Sources - Data Density - YouTube

1 Like

OK, I was assuming it was the total prevalence because it’s increasing.

t