Medicare ETL development

Patrick_Ryan · February 4, 2015, 8:07pm

For the use case listed, I would argue this makes for a very compelling
argument to attempt to classify the physician claims as inpatient or
outpatient, rather than trying to have them separated out as different. But
agree, this is the type of discussion the group should focus on to reach
consensus and best practices. The cdm can handle whatever decision the
group makes.

Mark_Danese · February 4, 2015, 11:34pm

Here is the Github repo for documents, code, and anything else that needs to be shared for the project.

We will try put the meeting notes there as well.

(Edited above to link to OHDSI page per Patrick’s note below.)

ericaVoss · February 4, 2015, 9:34pm

We could create an OHDSI GitHub, I think we just need to ask someone like @Patrick_Ryan or @jon_duke

aguynamedryan · February 4, 2015, 9:47pm

@ericaVoss, I’m happy to transfer the repo over to OHDSI, but please make @Mark_Danese and myself collaborators so I can directly commit to the repo.

Whoever wants to handle the transfer, just email me. Thanks!

Patrick_Ryan · February 4, 2015, 10:10pm

Team:

I’ve created a OHDSI GitHub for you to use to share code for the CMS:
https://github.com/OHDSI/ETL-CMS. All github users in the OHDSI Developer
membership list have rights to commit to this repo. I noticed
@aguynamedryan and @mark_danese were not in the Developer group, so I
invited them. If there are other folks in the workgroup who plan to commit
to the project and who don’t yet have permissions, let me know and I’ll be
happy to add that.

In terms of managing workgroup logistics and administrative items, like
meeting invitations and notes, I’d encourage you to use the OHDSI Wiki and
not clutter the GitHub repo with these materials. Here’s a link to the
LAERTES workgroup:
http://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:kb-wg, where
I think @rkboyce is doing a brilliant job of organizing their team’s
activities through this, so serves as an effective model. I particularly
like how he’s been using one google docs document to maintain a rolling
inventory of all meeting notes, so you can quickly go through the
proceedings. I’m trying to follow this pattern for the general OHDSI
meetings, but I’m nowhere near as proficient as Rich, so am trying my best
to be learning and following his example.

Thanks for getting this effort kicked off! I’m excited to see a community
ETL come out that we can all benefit from.

aguynamedryan · February 4, 2015, 11:29pm

Thanks, @Patrick_Ryan. I agree that @rkboyce’s work serves as a great model. I ripped off the layout LAERTES WG’s Wiki Page to form the CMS-ETL WG Wiki Page. I hope I got everyone’s information correct. Please excuse (and hopefully correct) any errors/omissions you find.

Christian_Reich · February 9, 2015, 9:15pm

Friends:

Did we agree on the anything? I need a clear marching order, don’t want to cobble it together from the commenting cascade. Please somebody summarize.

Mark_Danese · February 9, 2015, 10:23pm

You meant the other thread on procedures, right?

Christian_Reich · February 9, 2015, 11:00pm

Yes.

Mark_Danese · February 17, 2015, 6:52pm

Just a reminder that we will touch base as a group at noon (pacific) today. We can summarize where the mapping to the CDM is (Erica, Jen, Amy, Michelle), and also talk about the loading of the data and planning for the ETL (Ryan, Lee, Don, and Bill).

ericaVoss · February 17, 2015, 9:37pm

I’ve updated the Rabbit-In-A-Hat in GitHub and Generated a DRAFT document:

I had two follow-ups:

(1) What gets used for RACE_CONCEPT_ID/RACE_SOURCE_VALUE/RACE_SOURCE_CONCEPT_ID and for ETHNICITY_CONCEPT_ID/ETHNICITY_SOURCE_VALUE/ETHNICITY_SOURCE_CONCEPT_ID?

For the RACE_CONCEPT_ID and ETHNICITY_CONCEPT_ID we would use the following:

SELECT *
FROM CONCEPT c
WHERE vocabulary_id = 'Race'
AND INVALID_REASON IS NULL

SELECT *
FROM CONCEPT c
WHERE vocabulary_id = 'Ethnicity'
AND INVALID_REASON IS NULL

Obviously we will just use BENE_RACE_CD for RACE_SOURCE_VALUE and ETHNICITY_SOURCE_VALUE.

However, I do not think we will populate the RACE_SOURCE_CONCEPT_ID and ETHNICITY_SOURCE_CONCEPT_ID. The documentation does not call out specific lookups. Instead we will take these lookups and put them in the SOURCE_TO_CONCEPT_MAP.

Centers for Medicare and Medicaid Services (CMS) Linkable 2008–2010 Medicare Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF)

(2) County Lookup (with Mark)

For BENE_COUNTY_CD we will use SSA Codes (http://www.resdac.org/cms-data/variables/County-Code) for COUNTY_CODE IN BENE_COUNTY_CD.

THIS CODE SPECIFIES THE SSA CODE FOR THE COUNTY OF RESIDENCE OF
THE BENEFICIARY. EACH STATE HAS A SERIES OF CODES BEGINNING WITH ‘000’
FOR EACH COUNTY WITHIN THAT STATE. CERTAIN CITIES WITHIN THAT STATE
HAVE THEIR OWN CODE. COUNTY CODES MUST BE COMBINED WITH STATE
CODES IN ORDER TO LOCATE THE SPECIFIC COUNTY. THE CODING SYSTEM IS
THE SSA SYSTEM, NOT THE FEDERAL INFORMATION PROCESSING STANDARD
(FIPS).

Mark_Danese · February 21, 2015, 10:14am

Just a reminder that the meeting on Monday will be for a more detailed discussion of the ETL process. If @lee_evans, @Frank, @aguynamedryan, @donohara, and @wstephens can make it, that would be helpful. (Others are certainly more than welcome.)

The main discussion will be to start thinking about how we might take advantage of everyone’s skills and write an etl process for the medicare data.

Frank · February 21, 2015, 3:43pm

Thanks for the reminder, I will definitely be attending.

donohara · February 23, 2015, 8:00pm

Here is a quick-and-dirty synpuf data viewer. It will be up for today’s call

http://54.213.235.187:5000/

lee_evans · February 23, 2015, 8:19pm

Hi @Mark_Danese

unfortunately I wasn’t able to attend the call today. As mentioned on previous calls I think my contribution can be to help out with the hosting of CMS synthetic data and associated tools.

Lee.

Mark_Danese · February 23, 2015, 9:32pm

This is VERY nice.

jenniferduryea · February 23, 2015, 9:41pm

I love how you can see all of the claims (across all files) for one patient id. Very cool.

ericaVoss · February 25, 2015, 10:31pm

Some Follow-Ups from Today’s 2/25/15 Meeting:

I added all the CONDITION_TYPES we needed for CARRIER_CLAIMS, INPATIENT_CLAIMS, and OUTPATIENT_CLAIMS.
Put our updated Rabbit-In-A-Hat file and DOC on GitHub.

ericaVoss · February 26, 2015, 2:20am

@Christian_Reich,

We would like to be able to tell the difference between different types of claims on the CONDITION_OCCURRENCE table:

INPATIENT_CLAIMS
OUTPATIENT_CLAIMS
CARRIER_CLAIMS

We want to be able to have a TYPE for CARRIER_CLAIMS just like we do IP and OP. What are your thoughts? If you need more information we can chat about it.

We are thinking we need something like this. We could maybe generalize the titles.

Carrier Claims header - 1st position
Carrier Claims header - 2nd position
Carrier Claims header - 3rd position
Carrier Claims header - 4th position
Carrier Claims header - 5th position
Carrier Claims header - 6th position
Carrier Claims header - 7th position
Carrier Claims header - 8th position

Carrier Claims details - 1st position
Carrier Claims details - 2nd position
Carrier Claims details - 3rd position
Carrier Claims details - 4th position
Carrier Claims details - 5th position
Carrier Claims details - 6th position
Carrier Claims details - 7th position
Carrier Claims details - 8th position
Carrier Claims details - 9th position
Carrier Claims details - 10th position
Carrier Claims details - 11th position
Carrier Claims details - 12th position
Carrier Claims details - 13th position

jenniferduryea · February 25, 2015, 10:55pm

@ericaVoss - Instead of “Carrier Claims - #th position”, I believe we want “Carrier Claims Detail - #th position” up to position 13. This will more closely follow the header/detail format of Inpatient vs Outpatient.