No, I think it makes good sense to bundle those files/tables together in your broadsea deployment.
Will it, @Chris_Knoll? Isn’t the problem that the recommendations depend on the data that created them? How do we solve that problem? Or are you thinking of a generic or community set of recommendations?
A generic/community set of recommendations by default. The main post on this points to a file that was generated on some set of data, so the vocabulary providing this table by default would be no different. If the Phoebe authors want to release instructions on how to derive your own concept recommendations from local data, they could do that, and then they (the data owners) could just replace/augment the default recommendations with their own.
The only data-specific piece may be patient context-derived recommendation. The rest is agnostic and the users can prioritize their review based on the network counts or their local data source counts.
On where to store the file and how to distribute: there have been discussions but we haven’t settled on anything yet. If the users have a preference, I’d gladly hear them out here
As I’m planning to work on enhancing the recommendations that would be a good time for deciding on the place of the file as well.
My prefrence would be that the vocabulary includes this table as part of the vocabulary download, but with an option that if people want to get updated recommendations or generate their own patient-context recommendations, that they can generated those for themselves (via a new R package that can be hosted in a git repo).
Maybe if there is going to be an R package to do this, the ‘default’ file can be saved inside this repo.
This sounds good to me, very flexible. Plus we can pull these files into Broadsea for easier deployments.
Ajit, are you using Phoebe? And the current option of downloading the file from forum is not flexible? Want to make sure I get it right.
I think since it is such a key feature in Atlas 2.12+ (with a dedicated UI), it’s important that sites know where to find it.
A more formal place to me would be in GitHub and/or as part of Athena downloads. Plus there’s a lifecycle that can be followed as users.
Okay, forgot the CSV is 130 MB. But zipped, it’s 30 MB, which is fine for GitHub (max is 100 MB).
It makes sense in general that it would be a more formal place. I’m not sure what it would be. GitHub makes sense to me personally if it’s an R package. Vocab pack if it is not. Not sure if it requires additional tweaking on Athena side, gotta ask.
Well, for now, I have the zip file of the CSV itself here in my fork of Broadsea, along with a bash script:
@aostropolets Would you have more information about how the patient context works? All I could find is this snippet from the Phoebe 2.0 manual. This hints that the network counts from Phoebe 1.0 were not used for this, but a different source.
PHOEBE 2.0:
CREATING COMPREHENSIVE CONCEPT SETS
Now with:
• enhanced lexical match (with lemmatization, bigrams, conversion to a common part of
speech, tf-idf and pairwise cosine similarity)
• patient context (pairwise cosine similarity based on the vectors of the concept co-occurring
with a given concept)
Would be great to know e.g. on which dataset(s) the patient context was derived. Thanks in advance.
Linking in a post where another user had the same question: PHOEBE recommendations and standard codes
Thanks, Maxim. I’m working on more extended docs along with code and will share the links once I’m done. The counts for all recommendations are coming from the network (or your ds if you use Phoebe on your local Atlas instance). Lexical and ontology recommendations were generated off of vocab tables, patient context ones were generated off of patient data from a collection of US claims and EHR data sources. We can run the code against your ds if you are interested in enhancing the recommendations with ones specific to your ds.
The recommended file as been updated and you can use this file:
concept_recommended_20240527.zip
For the latest recommendations.
I do not have permissions to edit the original post with this updated file.
While uploading the latest
concept_recommended_20240527.csv
I run into the next error:
ERROR: value too long for type character varying(20)
CONTEXT: COPY concept_recommended, line 4938309, column relationship_id: “Ontology-relationship”
So, the next command needs to be updated:
CREATE TABLE {schema}.concept_recommended
(
concept_id_1 bigint,
concept_id_2 bigint,
relationship_id character varying(20)
)
Thanks
The concept_recommended table must be created in the vocabulary schema referenced in the source_daimon table. Often this is the same schema as the CDM schema but they can be different.
Thanks Chris & Anthony for providing this clarification.
Hi @aostropolets, I was trying to better understand how the phoebe csv was generated. Is the code used available on github somewhere?
I would like to build on top of the phoebe results and if people have an OMOP CDM with the concept_recommended table included alongside their other vocabulary tables this is great as I can then work with them like with other vocabularly tables. But then I´m struggling to understand how I can know whether the concepts in concept_recommended are from the same vocabulary version as the concept table itself?
In the longer term, will concept_recommended become another vocabulary table, or would these be appended to concept relationship as I see they are in a similar format and that would probably make it even easier to incorporate into tools?
My understanding was that this table would be generated and produced as part of the vocabulary download. This is why we decided to instruct users to pulace this table in the CDM schema (with the other vocab tables)
Thanks @Chris_Knoll. I and @MaximMoinat can ask data partners we work with to upload the concept_recommended.csv into the database as is.
For the longer term, what I’m a bit confused about is whether:
- phoebe is like any other vocabulary table provided by ohdsi which the data partner uploads (and then it seems like it could indeed be provided from athena with the others and maybe the code used to generate could live in https://github.com/OHDSI/Vocabulary-v5.0 for transparency), or
- phoebe is more like achilles in that it is derived from the cdm data somebody has mapped, in which case maybe the code for it could even be added to achilles to increase uptake (as that is something people are used to running as they map a data source).
More than anything, the problem we currently have is that if a data partner does upload the concept_recommended.csv how can we know that the results are consistent with the other vocabulary tables they have? And how would we know whether the data from that data partner was used to generate the recommendations?
Any thoughts @aostropolets @Christian_Reich @Patrick_Ryan @Frank ?
There are several types of recommendations that are dependent on the vocabulary version. For example, ontology-parent and ontology-descendant are derived based on the concept_ancestor of a given vocabulary release (the recent Phoebe update uses the most recent version of the vocabs). So to be fully consistent they want to be on the Feb 2024 vocabulary release.
The source currently is the aggregated data from 22 CDMs collected in the OHDSI Concept Prevalence study from a couple of years ago.
Unfortunately, we did not have many EU partners contributing to the study and would be very happy to expand the coverage. Would be happy to discuss