I believe the NOTE_NLP is flawed in its violation of the central OHDSI/OMOP tenant that every fact normalized to a standardized vocabulary belongs in one and only one domain/table. NOTE_NLP is a tolerated heretic within the OHDSI/OMOP church. I think we should modify the NOTE_NLP table by removing the note_nlp_concept_id and note_nlp_source_concept_id columns and replace them with a polymorphic foreign key (like the COST table’s cost_event_id/cost_domain_id):
note_nlp_event_id (NOT NULL, integer) : A foreign key identifier to the event (e.g. Condition, Observation, Measurement, Procedure, Visit etc) record that the nlp note represents.
note_nlp_domain_id (NOT NULL, varchar(20): The concept representing the domain of the note nlp event, from which the corresponding table can be inferred that contains the entity for which note nlp event information is recorded.
And then use @MPhilofsky proposal to track provenance within the type_concept_id column in each corresponding clinical event table.
This would confine the NOTE_NLP table to the proper function of recording the metadata details of the NLP extraction/derivation and leave the representation of clinical events to its proper compatriots
This would be a step in direction of making OMOP/OHDSI less focused on putting ‘reliable’ EHR/Claims data in one place and ‘unreliable’ NLP/abstracted/curated data in some dark corner.