I’m afraid I’m more disoriented after that name change “ETL working group” suggests a huge scope.
I guess I should elaborate on what I’m after:
The main project I work on is HERON, an installation of i2b2 at KU Med Center. When we started development, we sketched out things like our PatientCountStory. i2b2 has a user interface and I more or less know who the HERON user community is.
In contrast the goal stated above…
to clarify how to load the data into tables, decide what
data goes in what tables, whether to do vocabulary mapping or create an
oncology vocabulary, and to generate ETL code to help populating tables.
The focus is on SEER and NAACCR data.
… is hard for me to get my head around. The way you put data into tables all depends on how you want to query it, in my experience.
When we started adding NAACCR data to HERON, our use case was something like “count patients with grade 1 tumors.” We do better when we capture use cases that are actually interesting to researchers. For example, when our trauma registry folks wanted us to integrate their data with HERON, we met with them and came up with:
Use case: labs for people who came in with liver ulcerations.
In GPC, we’re working on integrating geocoding data (#140). There you’ll see a March 22 comment that captures use cases such as “How many diabetic patients (ICD9: 250) reside in rented property vs. own home.”
When I started working on site-specific factors, the use case was basically “count ER+ breast cancer patients.” It was implied by the fact that ER+ was in there as “site specific factor #1” but you had to have the NAACCR and ICD-O manuals at your side and wield some level-7 i2b2 query magic to actually use it. Now it’s straightforward to navigate into ** Site-specific factors / Breast / 01: Estrogen Receptor (ER) Assay** and drag 010: Positive/elevated in as a query term.
The “FORDS/NAACR tumor registry data” thread started with “We have interest at our institution on getting our ACOS tumor registry data into OMOP.” That could take any number of forms… anything from a simple flag that says “this patient is in the ACOS tumor registry” up to some sophisticated integration with imaging data. I’d like to hear some stories of how somebody would use the results of this working group once it has achieved its goals.