Dear OHDSI Community,
I’m currently working with the Eye Care and Vision WG to design ETLs for semi-structured fields in ophthalmology. Specifically, we are designing parsers to transform free text fields into structured representations. The fields are generally well-behaved but, as expected, there are abnormalities in there use that occur with varying frequency.
We would like to evaluate the impact of specific choices we make, i.e. a simple parser vs a more complex one. Are there any best practices in terms of metrics and methods for evaluating ETL (or NLP) implementations?
To highlight our thoughts so far: 1) there are some simple things we can do, such as row counts, prevalence agreement etc. 2) The impact will depend on the downstream observational studies that are run. We’ve considered doing a form of sensitivity analysis, running the analysis on two different versions to check. It’s also worth noting that we already have information on the rate at which particular instances are occurring.
Any advice from the collective expertise would be greatly appreciated!