OHDSI Home | Forums | Wiki | Github

Medicare ETL development

For the use case listed, I would argue this makes for a very compelling
argument to attempt to classify the physician claims as inpatient or
outpatient, rather than trying to have them separated out as different. But
agree, this is the type of discussion the group should focus on to reach
consensus and best practices. The cdm can handle whatever decision the
group makes.

Here is the Github repo for documents, code, and anything else that needs to be shared for the project.

We will try put the meeting notes there as well.

(Edited above to link to OHDSI page per Patrick’s note below.)

We could create an OHDSI GitHub, I think we just need to ask someone like @Patrick_Ryan or @jon_duke

@ericaVoss, I’m happy to transfer the repo over to OHDSI, but please make @Mark_Danese and myself collaborators so I can directly commit to the repo.

Whoever wants to handle the transfer, just email me. Thanks!

Team:

I’ve created a OHDSI GitHub for you to use to share code for the CMS:
https://github.com/OHDSI/ETL-CMS. All github users in the OHDSI Developer
membership list have rights to commit to this repo. I noticed
@aguynamedryan and @mark_danese were not in the Developer group, so I
invited them. If there are other folks in the workgroup who plan to commit
to the project and who don’t yet have permissions, let me know and I’ll be
happy to add that.

In terms of managing workgroup logistics and administrative items, like
meeting invitations and notes, I’d encourage you to use the OHDSI Wiki and
not clutter the GitHub repo with these materials. Here’s a link to the
LAERTES workgroup:
http://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:kb-wg, where
I think @rkboyce is doing a brilliant job of organizing their team’s
activities through this, so serves as an effective model. I particularly
like how he’s been using one google docs document to maintain a rolling
inventory of all meeting notes, so you can quickly go through the
proceedings. I’m trying to follow this pattern for the general OHDSI
meetings, but I’m nowhere near as proficient as Rich, so am trying my best
to be learning and following his example.

Thanks for getting this effort kicked off! I’m excited to see a community
ETL come out that we can all benefit from.

Thanks, @Patrick_Ryan. I agree that @rkboyce’s work serves as a great model. I ripped off the layout LAERTES WG’s Wiki Page to form the CMS-ETL WG Wiki Page. I hope I got everyone’s information correct. Please excuse (and hopefully correct) any errors/omissions you find.

1 Like

Friends:

Did we agree on the anything? I need a clear marching order, don’t want to cobble it together from the commenting cascade. Please somebody summarize.

You meant the other thread on procedures, right?

Yes.

Just a reminder that we will touch base as a group at noon (pacific) today. We can summarize where the mapping to the CDM is (Erica, Jen, Amy, Michelle), and also talk about the loading of the data and planning for the ETL (Ryan, Lee, Don, and Bill).

I’ve updated the Rabbit-In-A-Hat in GitHub and Generated a DRAFT document:


I had two follow-ups:

(1) What gets used for RACE_CONCEPT_ID/RACE_SOURCE_VALUE/RACE_SOURCE_CONCEPT_ID and for ETHNICITY_CONCEPT_ID/ETHNICITY_SOURCE_VALUE/ETHNICITY_SOURCE_CONCEPT_ID?

For the RACE_CONCEPT_ID and ETHNICITY_CONCEPT_ID we would use the following:

SELECT *
FROM CONCEPT c
WHERE vocabulary_id = 'Race'
AND INVALID_REASON IS NULL

SELECT *
FROM CONCEPT c
WHERE vocabulary_id = 'Ethnicity'
AND INVALID_REASON IS NULL

Obviously we will just use BENE_RACE_CD for RACE_SOURCE_VALUE and ETHNICITY_SOURCE_VALUE.

However, I do not think we will populate the RACE_SOURCE_CONCEPT_ID and ETHNICITY_SOURCE_CONCEPT_ID. The documentation does not call out specific lookups. Instead we will take these lookups and put them in the SOURCE_TO_CONCEPT_MAP.

Centers for Medicare and Medicaid Services (CMS) Linkable 2008–2010 Medicare Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF)


(2) County Lookup (with Mark)

For BENE_COUNTY_CD we will use SSA Codes (http://www.resdac.org/cms-data/variables/County-Code) for COUNTY_CODE IN BENE_COUNTY_CD.

THIS CODE SPECIFIES THE SSA CODE FOR THE COUNTY OF RESIDENCE OF
THE BENEFICIARY. EACH STATE HAS A SERIES OF CODES BEGINNING WITH ‘000’
FOR EACH COUNTY WITHIN THAT STATE. CERTAIN CITIES WITHIN THAT STATE
HAVE THEIR OWN CODE. COUNTY CODES MUST BE COMBINED WITH STATE
CODES IN ORDER TO LOCATE THE SPECIFIC COUNTY. THE CODING SYSTEM IS
THE SSA SYSTEM, NOT THE FEDERAL INFORMATION PROCESSING STANDARD
(FIPS).

Just a reminder that the meeting on Monday will be for a more detailed discussion of the ETL process. If @lee_evans, @Frank, @aguynamedryan, @donohara, and @wstephens can make it, that would be helpful. (Others are certainly more than welcome.)

The main discussion will be to start thinking about how we might take advantage of everyone’s skills and write an etl process for the medicare data.

Thanks for the reminder, I will definitely be attending.

Here is a quick-and-dirty synpuf data viewer. It will be up for today’s call

http://54.213.235.187:5000/

Hi @Mark_Danese

unfortunately I wasn’t able to attend the call today. As mentioned on previous calls I think my contribution can be to help out with the hosting of CMS synthetic data and associated tools.

Lee.

This is VERY nice.

I love how you can see all of the claims (across all files) for one patient id. Very cool.

Some Follow-Ups from Today’s 2/25/15 Meeting:

  1. I added all the CONDITION_TYPES we needed for CARRIER_CLAIMS, INPATIENT_CLAIMS, and OUTPATIENT_CLAIMS.
  2. Put our updated Rabbit-In-A-Hat file and DOC on GitHub.

@Christian_Reich,

We would like to be able to tell the difference between different types of claims on the CONDITION_OCCURRENCE table:

  • INPATIENT_CLAIMS
  • OUTPATIENT_CLAIMS
  • CARRIER_CLAIMS

We want to be able to have a TYPE for CARRIER_CLAIMS just like we do IP and OP. What are your thoughts? If you need more information we can chat about it.

We are thinking we need something like this. We could maybe generalize the titles.

  • Carrier Claims header - 1st position
  • Carrier Claims header - 2nd position
  • Carrier Claims header - 3rd position
  • Carrier Claims header - 4th position
  • Carrier Claims header - 5th position
  • Carrier Claims header - 6th position
  • Carrier Claims header - 7th position
  • Carrier Claims header - 8th position

  • Carrier Claims details - 1st position
  • Carrier Claims details - 2nd position
  • Carrier Claims details - 3rd position
  • Carrier Claims details - 4th position
  • Carrier Claims details - 5th position
  • Carrier Claims details - 6th position
  • Carrier Claims details - 7th position
  • Carrier Claims details - 8th position
  • Carrier Claims details - 9th position
  • Carrier Claims details - 10th position
  • Carrier Claims details - 11th position
  • Carrier Claims details - 12th position
  • Carrier Claims details - 13th position

@ericaVoss - Instead of “Carrier Claims - #th position”, I believe we want “Carrier Claims Detail - #th position” up to position 13. This will more closely follow the header/detail format of Inpatient vs Outpatient.

t