OHDSI Home | Forums | Wiki | Github

Phoebe 2.0

Atlas 2.12.0 includes PHOEBE 2.0: concept recommendations. This function is available in the concept set editor under the ‘Recommend’ tab. However, in order for this to function to work, a custom table concept_recommended will need to be created and initialized with concept recommendations in your cdm schema with your other vocabulary tables.

These recommendations are provided as a .csv file and table DDL here.

In the event your environment does not have the concept_recommended table, Atlas will present a message directing to this forum post, with information about the latest version of the concept recommendations.

The current concept recommendations is in the file concept_recommended_20221006.zip.


This is awesome work, and I know it will be a huge help for concept set authors both novice and expert!

@Chris_Knoll could we host these files elsewhere, perhaps a new OHDSI github repo?

My understanding is that this will be provided as a standard part of the vocabulary download (ie: along with concept, concept_ancestor, concept_relationship, etc).

so, for now, this is a temporary arrangement until it becomes part of the formal process.

@Patrick_Ryan and @aostropolets, is there any eta on when this table will be included?

1 Like

Gotcha. I’ve got a new branch of Broadsea that will include loading this into an omop vocab postgres schema in progress. Any issues with these files being part of that for now?

1 Like

No, I think it makes good sense to bundle those files/tables together in your broadsea deployment.

1 Like

Will it, @Chris_Knoll? Isn’t the problem that the recommendations depend on the data that created them? How do we solve that problem? Or are you thinking of a generic or community set of recommendations?

A generic/community set of recommendations by default. The main post on this points to a file that was generated on some set of data, so the vocabulary providing this table by default would be no different. If the Phoebe authors want to release instructions on how to derive your own concept recommendations from local data, they could do that, and then they (the data owners) could just replace/augment the default recommendations with their own.

The only data-specific piece may be patient context-derived recommendation. The rest is agnostic and the users can prioritize their review based on the network counts or their local data source counts.

On where to store the file and how to distribute: there have been discussions but we haven’t settled on anything yet. If the users have a preference, I’d gladly hear them out here :slight_smile:

As I’m planning to work on enhancing the recommendations that would be a good time for deciding on the place of the file as well.

My prefrence would be that the vocabulary includes this table as part of the vocabulary download, but with an option that if people want to get updated recommendations or generate their own patient-context recommendations, that they can generated those for themselves (via a new R package that can be hosted in a git repo).

Maybe if there is going to be an R package to do this, the ‘default’ file can be saved inside this repo.

1 Like

This sounds good to me, very flexible. Plus we can pull these files into Broadsea for easier deployments.

Ajit, are you using Phoebe? And the current option of downloading the file from forum is not flexible? Want to make sure I get it right.

I think since it is such a key feature in Atlas 2.12+ (with a dedicated UI), it’s important that sites know where to find it.

A more formal place to me would be in GitHub and/or as part of Athena downloads. Plus there’s a lifecycle that can be followed as users.

1 Like

Okay, forgot the CSV is 130 MB. But zipped, it’s 30 MB, which is fine for GitHub (max is 100 MB).

It makes sense in general that it would be a more formal place. I’m not sure what it would be. GitHub makes sense to me personally if it’s an R package. Vocab pack if it is not. Not sure if it requires additional tweaking on Athena side, gotta ask.

Well, for now, I have the zip file of the CSV itself here in my fork of Broadsea, along with a bash script:

1 Like

@aostropolets Would you have more information about how the patient context works? All I could find is this snippet from the Phoebe 2.0 manual. This hints that the network counts from Phoebe 1.0 were not used for this, but a different source.

Now with:
• enhanced lexical match (with lemmatization, bigrams, conversion to a common part of
speech, tf-idf and pairwise cosine similarity)
• patient context (pairwise cosine similarity based on the vectors of the concept co-occurring
with a given concept)

Would be great to know e.g. on which dataset(s) the patient context was derived. Thanks in advance.

Linking in a post where another user had the same question: PHOEBE recommendations and standard codes

Thanks, Maxim. I’m working on more extended docs along with code and will share the links once I’m done. The counts for all recommendations are coming from the network (or your ds if you use Phoebe on your local Atlas instance). Lexical and ontology recommendations were generated off of vocab tables, patient context ones were generated off of patient data from a collection of US claims and EHR data sources. We can run the code against your ds if you are interested in enhancing the recommendations with ones specific to your ds.


The recommended file as been updated and you can use this file:

For the latest recommendations.

I do not have permissions to edit the original post with this updated file.

While uploading the latest


I run into the next error:

ERROR: value too long for type character varying(20)
CONTEXT: COPY concept_recommended, line 4938309, column relationship_id: “Ontology-relationship”

So, the next command needs to be updated:
CREATE TABLE {schema}.concept_recommended
concept_id_1 bigint,
concept_id_2 bigint,
relationship_id character varying(20)