OHDSI Home | Forums | Wiki | Github

Tagging System

There has been discussion on the need for a Tagging system within ATLAS for a while now so I’ve put together an initial database design for comment. The Tagging system would allow us to Tag cohorts, concept sets, or other assets within the OHDSI schema.

An example use case would be to tag multiple cohorts and concept sets that are all being used for a particular project with a project name.

Another use case would be creating a ‘validated’ tag and then tagging cohorts that have gone through an internal review or validation process.

We would incorporate features into the UI that would allow users to filter listings using tags and also display the tag in the header of various assets.

This is an initial suggestion and I welcome all feedback and discussion. I’m including a simple database design diagram below.

I would also need to see how you would manage these tags, so could you provide the UI that would select/edit/update/delete these tags? If it was a simple tagging system without a hierarchy, I think I could imagine it, but the TagCategory idea you have above complicates things by introducing a hierarchy. Will there be some sort of “tag explorer” that users will need to be familiar with to understand the hierarchy? If you filter on a tag, does that include tags that are descendants?

This might be a good opportunity to include some ownership of items and the ability to allow read-only, write, etc. access to other users would help to protect the objects from inadvertent/unwanted changes. I think as a UI something resembling the Dropbox web UI might work. I might like to keep something private while I’m in the early stages of development but open it up to either the world or specific collaborators later on.

Thanks @Chris_Knoll - based on your suggestion I did some more reading on tagging systems in general. For those interested in some of the papers I reviewed here are the links:

http://www.hpl.hp.com/research/idl/papers/tags/tags.pdf
http://ilpubs.stanford.edu:8090/775/1/2006-10.pdf
http://www.shirky.com/writings/ontology_overrated.html?goback=.gde_1838701_member_179729766

I also took a look at the StackOverflow data model for tags here:

http://data.stackexchange.com/stackoverflow/query/new

Of all of that the Shirky paper on how ontologies are overrated was the most impactful if you just want to read one. Ultimately while I had given thought to allowing Tags to have hierarchy to support folder / subfolder types of categorization it seems tagging systems have been shown to be much more effective without it.

The data model in my original post is an embedded document so the changes to it should show up automagically and show no more TagCategory table and the addition of a few additional fields that appear to be valuable in other large tagging systems. (Del.icio.us, StackOverflow).

The features @jswerdel mentioned for item ownership and read/write type privileges will be coming with the pending release of the integration of Apache Shiro as a security model.

Thanks doing the research @Frank! +1 on removing parent-child relationship support for the data model: simpler in this instance is better.

+1 on the tagging proposal overall.

t