Sorry for no updates for a couple of weeks — I was on vacation. So, thanks to @ericaVoss and @jenniferduryea , we now have a pretty solid ETL spec. The next step is to program it, so we can get data to the community.
There are a couple of options:
- If you want to do it all yourself, you are free to do so, and can ask Jen and/or Erica for clarification at any time. Use whatever language you know best.
- If you want to contribute and do part of the ETL with others, we will have to agree on a language, and a division of labor.
Possible options for option 2 include Java, Python, SQL + database of your choice, and R. Of course, if you want to use Cobol, Perl, Ruby, or something else, you can take option #1 and possibly do it in parallel with someone else (good for QC). Even if you pick something more “mainstream", you may also be doing it in parallel if nobody else is comfortable with it.
The goal for this is to get it done, and to get the data out there, available for everybody. Depending on the person, and how it is done, we may also want to make the ETL code available via GitHub, but this is not absolutely essential. In other words, we would rather get it done fast, than get it done pretty.
So, if you are willing to contribute via option 1 or 2, please email me back. Once I have that list, I think it will be pretty easy to figure out how this will work.
Note: Before we finish, we will also create a test ETL dataset to test the code against. So, if you don’t want to do the ETL, you can contribute by making up data.
Also, I will put this out on the forums, so other who are interested can join. @Frank, @Patrick_Ryan, @Christian_Reich, @wstephens, @lee_evans, @donohara, @aguynamedryan, @mlgleeson
Regards,
Mark