OHDSI Home | Forums | Wiki | Github

The new Working Group for Hadoop


(Naga Eskala) #41

Hi Shawn, How/Where can we get info about Hadoop Working Group conference call details?

(Shawn Dolley) #42

Hi Naga - go to this page on the Wiki and scroll down and you will see it
Or here is the info
Schedule: Every other Friday at 8am US PST/11am US EST/4pm UK time zone

Next Meeting: (Date and Time) October 28, 2016, 11am US EST

Call in Number: 1-650-479-3208

Attendee access code: 624188217

WebEx: http://cloudera.webex.com/meet/sdolley

(Naga Eskala) #43

Thank You Shawn.

(Shawn Dolley) #44

Great first meeting guys and gals. Here is a link to find the raw and original meeting log http://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:hadoopwgmeetingloglandingpage . To find this or if the link is broken

  1. Go to the OHDSI Wiki by clicking on the word ‘Wiki’ at the top of any page in the forum.
  2. Click on Projects and working groups link at the left side of that Wiki home page
  3. Click on Hadoop Working Group
    Wow! Look at all this great info, I better bookmark this page!
  4. Scroll down until you see Hadoop Working Group Meeting Logs link (or something like that), click it.
  5. Now you should see a page where I will post the logs. For now, will be in MSFT Word format (sorry if that is painful).

(Al Pivonka) #45

I wanted to follow up on something from the other days meeting and to share a bit of information.
I believe that by targeting “sql” as the standard language for integration with the Hadoop platform is ideal.

The underlying architecture(s) (Hadoop/Yarn/Mesos/ or storage HDFS/S3) should not be the focus. These are all configurable.

I would suggest for the team to look at Hadoop as an application/data container, the container provides services on top of the data.

Our focus should be on standards for accessing the data utilizing the tools within the container.

Here is an ever growing list of projects and their classifications with in the Hadoop container.

In the context of Hadoop being an application/data container, one can also build out a container with specific tools so that the container only contains the tools needed.

  1. (Open-source) Ambari (Management and provisioning) + HDFS + Spark + Spark SQL + Sqoop + Ozzie + Hive/
  2. Both Cloudera CDH and Hortonworks HDP (Full stack or only deploy the tools needed)
    So if we stay focused on the standards and our use cases, we will be able to work within any mixture/stack someone puts together.

Just my thoughts

(Shawn Dolley) #46

Folks, I have put the formatted notes from Friday’s architecture call on the Wiki, FYI.

(Shawn Dolley) #47

Hello. As promised, here is a ‘voting doc’ or better named a standard doc to enter your feedback on the use cases and next things to build or work out. If you all rank order your priorities (which can be hard!) I will tally the results (or someone else can tally them). The benefit of this is that we can ensure time in the agenda and people hours we can get a hold of are being spent on things benefitting the most members. I don’t know how to attach the doc to the forum, but I will figure it out. Doc right now is on Wiki at Projects & Workgroups… Hadoop WG…Meeting Logs: Hadoop WG Meeting Logs. Then when you go to that page, you will see Other Docs down below, and the ‘voting doc’ for lack of a better name. Please either email back to sdolley@cloudera.com or somehow post it somewhere with a pointer to it so I can tally it.

Link to page with Priorities Voting Doc

(Chetan Vora) #48


This is Chetan (@QuintilesIMS). Please add me to the list.


(Shawn Dolley) #49

Here is a link to a survey format of the questions on the table. No one responded to the Word doc approach, so I am going to keep hounding you all: https://www.surveymonkey.com/r/D6C3W32 . I have never done a Survey Monkey so if this link doesn’t work, or asks for money or some other weird thing, let me know. It would be great to get lots of data back and analyze it! (have you heard that before…)

(Shawn Dolley) #50

Hadoop Interested People! Here is a prospective agenda below for our Hadoop Working Group call tomorrow. Please consider this post a call for additional agenda items, please post in the forum or if you don’t want to you can email to me at sdolley@cloudera.com

  1. Intros of first time attendees to the call/working group
  2. Old Business: the questionnaire/survey. I want to bring the survey up on screen during the meeting, and get commits from people to fill it out!
  3. New Business
    a) do we want or need a reference architecture
    b) what can we be doing in parallel over next few weeks
    c) I will navigate on screen to what Tom White has coded so everyone knows where it is (or Tom will) and Tom can describe what you can and can’t do with it, its limitations (assuming Tom is on)
    d) can we get a checklist of things we agree should be developed (that’s what survey is for in part)
    e) throw it open to audience for topics, questions, lines of discussion

The call will be recorded as per OHDSI standard.

(Shawn Dolley) #51

Hadoop Working Group members and interested parties: Our next scheduled workgroup meeting would occur on 25th of November, 2016. Since this is the day after US Thanksgiving holiday and often a work holiday, there will be no call that day. We will resume at our next scheduled date, which would be 9th of December. In the interim, I will be tallying and publishing survey results and working on reference architecture template(s) and soliciting members to contribute time on some tasks perhaps. Have a wonderful holiday (or if not on holiday have a wonderful time getting ahead on work). I am very thankful of all your participation, I feel like this group is working very well!

(Shawn Dolley) #52

The meeting log for November 11, 2016 has been posted. Find it here: http://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:hadoopwgmeetingloglandingpage which is on the Wiki, or go to the Wiki and choose Projects and working groups and then Hadoop WG and then the link for Hadoop WG Meeting Logs and other docs. My webex recording was 50MB. At this point, I am going to simply be taking notes for the WG and on the next call (see previous post, it will be Dec 9) we can discuss the issue of work involved in recording and if the other groups are recording or not.

(corina bennett) #53

Hi Shawn - Could you please add me to the working group? I have been working with Hadoop, Impala, and OMOP for the past year and would be interested in contributing.


(Mui Van Zandt) #54

Hi Shawn - Can you add me to the working group? We are launching a new project with OMOP on Hadoop and be interested in contributing.


(Shawn Dolley) #55

Absolutely Mui – if you could send me your email address, to
sdolley@cloudera.com that would be super.


Shawn Dolley

Industry Leader, Health & Life Science





"I’ve opted for fun in this lifetime.” - Jerry Garcia

“One of the truest tests of integrity is its blunt refusal to be
compromised.” - Chinua Achebe

(Mui Van Zandt) #56


Thanks. My email address is muivanzandt@yahoo.com

(Shawn Dolley) #57

Here is a link to the Wiki page for Hadoop WG. Please click on the link (under the roster) for Documents & Meeting Logs. http://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:hadoop-wg
Then click on the last file which is a prospective reference architecture for Hadoop. For folks who have provided me their email address I have also sent it via that method.

(Taha Abdul-Basser) #58

Hi Shawn,

Thank you. This is great!



I’d like to participate.

(Wanghaisheng) #60

long time no see