Hi,
Currently our source data has some discrepancies with the visit_ids. For instance, the data looks like below (same visit_ids for subsequent related visits). Please note that accompanying clinical data also has visit_ids as shown below
Person_id | Visit_start_date | Visit_end_date | Visit_id |
---|---|---|---|
1 | 21/11/2008 | 23/11/2008 | A1 |
1 | 28/3/2009 | 28/3/2009 | A1 |
1 | 27/5/2010 | 29/5/2010 | A1 |
But the actual data should have been like as below (where visit_ids is supposed to indicate the chronological order (so we can know it is related))
Person_id | Visit_start_date | Visit_end_date | Visit_id |
---|---|---|---|
1 | 21/11/2008 | 23/11/2008 | A1-01 |
1 | 28/3/2009 | 28/3/2009 | A1-02 |
1 | 27/5/2010 | 29/5/2010 | A1-03 |
Since our data looks like as shown in Table 1 above (with visit_id issue), We decided to store the data as shown below (in visit tables as shown below - letâs not worry about field format for now as I would like to explain with simple example. Hence I have chosen A1-01
etc)
Visit_occurrence
Person_id | Visit_start_date | Visit_end_date | Visit_id |
---|---|---|---|
1 | 21/11/2008 | 23/11/2008 | A1-01 |
Visit_detail
Person_id | Visit_start_date | Visit_end_date | Visit_detail_id | visit_occurrence_id |
---|---|---|---|---|
1 | 28/3/2009 | 28/3/2009 | 2 | A1-01 |
1 | 27/5/2010 | 29/5/2010 | 3 | A1-01 |
I know visit_detail allows to store only transfers within a single visit. We didnât use episode tables yet because we are in CDM V5.3.1 and I think Atlas and all other ohdsi packages support V5.3.1 very well when compared to V6.0.
a) So, would this result in any problem during Atlas cohort generations?
For ex: I know in Atlas, when we generate cohorts, the Atlas would run some SQL in the background to get concept_hieraracy etc. If I am about select only visit_occurrence
, would Atlas be able to pick the child visits from visit_detail table as well (because we have linking visit_occurrence_id in Visit_detail table) or I have to select visit_detail table seperately.
b) Would this result in any problem during usage of other R packages (HADES)? letâs say for ex: in real data, we may have 100 records (with quality issues as described above). But we store first visits (in visit_occurrence) which would be 40 records and child visits in visit_detail which would be 60 records. So, for all analysis, we want these 100 records (from both the tables to be considered and not just visit_occurrence)
c) We donât wish to tamper the original information where each visit and corresponding clinical information each visit can be identified easily using visit_ids (A1). If we were create our own visit_ids from scratch,we have settle for some heuristic where we assign visit_ids by some date time intervals (which may not be good/accurate)