There is a LAERTES Minutes thread and I though a good way to share an update for the LAERTES team is to start a generic thread for LAERTES here.
I was working on ClinicalTrials.gov data integration into LAERTES.
I recently made another incremental progress.
We can extract links from trials to CDM-Vocabulary (CDMV) coded drugs. For example here:
drug_CID, HOI_CID, trial, drug_name
1338512 0 NCT00003907 doxorubicin
1389036 0 NCT00003907 mitomycin
1344381 0 NCT00004054 bicalutamide
1378382 0 NCT00004054 paclitaxel
1350504 0 NCT00004054 etoposide
1356461 0 NCT00004054 flutamide
19012585 0 NCT00004228 asparaginase
1310317 0 NCT00004228 cyclophosphamide
1437379 0 NCT00004228 thioguanine
1436650 0 NCT00004228 mercaptopurine
1311078 0 NCT00004228 cytarabine
1518254 0 NCT00004228 dexamethasone
1305058 0 NCT00004228 methotrexate
1551099 0 NCT00004228 prednisone
1310317 0 NCT00004563 cyclophosphamide
We can have 3000+ rows like that of NLPed nice concepts (CID=concept_id).
(all rows are about trials where there are results (hence AEs) and also all have concepts we reliably linked to CDMV CIDs.
The problems with the HOI part (health outcome of interest) are the following: (based on manual review of some of the rows)
It is difficult to reliably “computationally parse the HOIs in
CT.gov”
For example, the problems are:
drug1 and drug2
an arm (or whole single arm trial) can have 2 drugs being given.
Then it is hard(impossible) to know which drug exactly caused the AE
drug1 or drug2
trial arm is titled: Ciprofloxacin or Ofloxacin
(both drugs are detected by NLP), impossible to assign AE
https://clinicaltrials.gov/ct2/show/results/NCT00002850?sect=X30156#evnt
drug and radiation
AE can be due to non-drug intervention (radiation) (and each arm
has drug+radiation)
https://clinicaltrials.gov/ct2/show/results/NCT00003377?sect=X30156#evnt
WHEN IT IS EASY: (and possible)
clinical trial with one intervention (no non-drug) and with one
group defined
https://clinicaltrials.gov/ct2/show/results/NCT00001941?sect=X30156#evnt
CONCLUSION
For LAERTES – we either may have to forget about CT.gov as data provider for LAERTES since it can not reliably always produce a drug-hoi[-trial] data row.
Or, we could introduce a “semi-automatic” mode or evidence review (we give you
links for you to disambiguate (and even human will not be able to disambiguate
in many cases (e.g., radiation+drug)
We would have a new table in the LAERTES schema for that that would have only 2 key colums
(drug_CID, link_out_link (for manual review;leading to the ct.gov AE tables and trial arms structure))