I’m planning on using ACHILLES as part of a class project and while there is good information on how to install and use ACHILLES itself, I don’t see a straight-forward set of instructions of how to create a database of example data to run it against. I’m starting everything from scratch so I don’t have any requirements around specific databases to use and I don’t need the data for anything other than populating all of the charts available in AchillesWeb with visually correct results. The python_etl package seems to be along the lines of what I want (and the SynPUF data should be sufficient for my needs) but there are disclaimers saying it’s a work in progress at this point. I’m a novice with both the OHDSI tools themselves and data analysis in general so any basic beginner advice would be helpful. I’m currently planning on using Ubuntu for this if that matters but I’m flexible if another environment is easier.
@donohara wrote the python script and it should work. If not, I am sure he can try and help with simple questions. Or someone else can too. If you just need data, @lee_evans has a subset of data available on his website so you could test the Achilles part separately from the ETL part. He may have a larger set available, or know where such a larger set exists.
I was able to get python_etl to work correctly in generating the csv output files, and after loading them into PostgreSQL I was able to install Achilles and get the data generated and processed for AchillesWeb to display. The one problem I’m running into is that CommonDataModel scripts for importing the data don’t seem to support the values for some of the fields in the output csv files generated by python_etl. Many of the *_source_concept_id fields are set as INTEGER fields in the ddl script, but most of the actual values have non-Integer values (decimal points, alpha-numeric values, numeric values above the maximum value, etc.) which causes errors during import. Right now I’m forcing those values into ones that are valid Integers, but I think this is causing problems with correctly processing the values in ACHILLES so most of the reports are empty. Does anyone know what might be going wrong and how to fix it?
Source concept IDs are supposed to be integers: they represent the vocabulary concept ID of the source code that the vocabulary entity represents. In other words, the 284.4 ICD9 code has concept ID 16928342 (just making that up). The source_conept_id would be set to 16928342, and the source_code_value would be set to 284.4 (although most source systems I’ve seen drops the decimals so it would actually go into our CDM as 2844, but it should be whatever value the source system had). So if you see decimal values in that column, there’s a problem with the ETL.
Source_code_value, on the other hand, is a text field which is just the raw/native value from the source record that the row in the CDM table mapped from. Are you possibly confusing the source_concept_id field with the source_code_value?
-Chris