Hello all!
I have been working on a project where we’ve been attempting to make the most flexible source data to OMOP CDM conversion possible. That has proven very difficult, if not impossible.
One part I think works somewhat well is that to map our source data to standard concepts we have implemented a word embedding search. We did this by creating embeddings of the text in Athena and then we find the closest matches using the FAISS method. We initially used cosine-similarity, but that did not scale well at all and we have had very similar results using the FAISS method with much greater performance.
Ok, now let’s say that I have all my source tables/columns and my data mapped out to standard concepts. What would be the next best step? I’m having a lot of difficulty creating SQL statements from the mapped data to satisfy the OMOP tables/columns.
For example, let’s say I have birthdate as a source column in the patients table. I can find this concept ID using the embedding comparison that we have. Athena This shows that it goes to the observation table which I guess makes sense. but we also need to create the person OMOP table as well with this. I’m guessing we would need to create a process that watches for these codes to create the person table? Also, how do you handle the column names inside the tables like observation? Athena doesn’t seem to provide any information in that regard. Where would I place the birthdate value in observation?
I’m at a loss to do this programmatically. I’ve been looking at Perseus and Usagi to maybe help out in this step.
Thanks for any and all help!!!