Hello everyone - and thanks in advance for your thoughts:
Not only am I transforming historical health data to the CDM, but I am continuously collecting data in the form of spreadsheets. Several fields I collect could be considered factors with a handful of acceptable levels. E.g., surgical technique questions. Do you know of ways to format the data I receive to minimize the risk of getting unacceptable values? One idea I had was to only accept binary values but have columns for each acceptable factor level.
For example, these could be columns:
Surgical_Technique-Direct, Surgical_Technique-Indirect, Surgical_Technique-Open.
And then I would only accept 1/0 for the columns.
Is this reasonable? Is there a best practice for collecting data in a spreadsheet that then has to go through ETL?
The idea is to to minimize the overhead to transform and load incoming data without exposing data submitters to CDM terminology directly.
I just thought I would bounce this off the community for your thoughts.