OHDSI Home | Forums | Wiki | Github

Data Collection Best Practices

Hello everyone - and thanks in advance for your thoughts:

Not only am I transforming historical health data to the CDM, but I am continuously collecting data in the form of spreadsheets. Several fields I collect could be considered factors with a handful of acceptable levels. E.g., surgical technique questions. Do you know of ways to format the data I receive to minimize the risk of getting unacceptable values? One idea I had was to only accept binary values but have columns for each acceptable factor level.

For example, these could be columns:
Surgical_Technique-Direct, Surgical_Technique-Indirect, Surgical_Technique-Open.

And then I would only accept 1/0 for the columns.

Is this reasonable? Is there a best practice for collecting data in a spreadsheet that then has to go through ETL?

The idea is to to minimize the overhead to transform and load incoming data without exposing data submitters to CDM terminology directly.

I just thought I would bounce this off the community for your thoughts.

@imlay
What data are in these spreadsheets, and what surgical techniques are you talking about? Techniques conducted during operations on the patient? What are you trying to achieve?

t