OHDSI Home | Forums | Wiki | Github

Ingesting fixed tissue slides into the specimen table: Unit of measure?

Hey all,

We are ingesting some data from our biobank (FYI, OpenSpecimen) into the specimen table. In source, “paraffin embedded slide” has unit = “count” and value = 1, as there wasn’t a standard vocabulary such as UCUM utilized.

If we are ingesting the fact that we have 1 paraffin embedded slide (specimen_concept_id = 46274042) available for a patient, would the quantity be 1, and unit_concept_id be 0? I cannot find an appropriate concept for such a representation that there is 1 slide specimen available. Measuring the dimensions of the slide is of no scientific interest to our researchers (:stuck_out_tongue: ) so I haven’t been able to find a good UCUM approach to this.

Second part of my question:
My assumption is that OMOP would like there to be a row for every individual slide, even if from the same block, as we’re wanting to represent each individual physical specimen on a row. If it’s the case that we ever have “Count” = “3” in OpenSpecimen for a Fixed Tissue Slide sample type, should I explode this specimen out into 3 different rows in the specimen table in OMOP v5.4, each with quantity = 1, and unit_concept_id = 0?

@MPhilofsky , looks like you previously worked with the Specimen table back in 2019, and @roger.carlson, you previously ingested parafin fixed slide samples. How did you both deal with this?

Here’s your original post on specimen table:

Hello @Daniel_Smith!

Wow, it feels like 10 years ago! So, first things first. What’s your use case?

Per the CDM v5.4 conventions, “The specimen domain contains the records identifying biological samples from a person”. A paraffin embedded slide is not a biological sample from a person. It is the Device used to hold the biological sample. Depending on your use case, I don’t think you need to create a record for the slide, but should create one for the sample. But your use case might dictate otherwise. Most specimen questions need to know what tissue is sourced for the sample.

If you have one slide, you have a quantity of 1. And I don’t think a slide would have an associated unit. Body fluids and tissue samples would have units, such as oz, ml, kg, cm, etc.

Then I would create one row per specimen and give it a quantity = 1.

Thanks @MPhilofsky ! Much appreciated.

We’d like to represent some of the specimens stored in our biobank, associated with an OMOP patient, in our OMOP instance. We’re going to do this sparingly, such that cursory information can be obtained about specimen availability for a cohort, but more specific information regarding specimen location (freezer, shelf, box, row, etc.) will be in source system. Think using Atlas to define cohort inclusion criteria having biopsy tissue available (on a block, a slide, or already processed slide) for a patient population. One possible purpose of collecting this data may be the evaluation of tissue samples for a digital microscopy project.

I understand this can be done in our datalake, but if self-service is possible for cohorts in Athena, that would be ideal!

Makes sense!

A follow-up question not yet addressed: we want to represent currently available quantities of specimens. The quantity field is going to represnet the current available quantity, while the mere fact that it exists in the system indicates that it was collected, so original quantity isn’t as important to us. There may be cases however where the quantity is truly unknown as we would need the biobank staff to go back to the sample location to pull/confirm. My assumption is that unknown quantities are represented by null values on the Specimen table?

Correct, this field is nullable and that is the best way to represent the data.

@MPhilofsky thanks so much for your previous input! For your specimens what are you putting for “specimen_type_concept_id”?

https://athena.ohdsi.org/search-terms/terms?standardConcept=Standard&domain=Type+Concept&page=1&pageSize=15&query=

I am no longer working on that project, but looking at it now I would use the generic EHR type_concept_id = 32817 if the data came from the EHR. If it came from an outside lab system, I would use type_concept_id = 32856.

1 Like
t