OHDSI Home | Forums | Wiki | Github

New GitHub repo for Pregnancy Algorithm

Hi,

I’d like to get an OHDSI GitHub repo for submitting the Pregnancy Algorithm code developed by @amatcho. I’ve converted the original code to be more OHDSql-friendly, and it includes some support for large inserts in MPP databases like Redshift (complete) and PDW (in progress). I’m still in the testing phase to ensure the episodes generated are correct, but I anticipate being finished with that in another week.

Tagging @Chris_Knoll and @Jill_Hardin who have been managing this code’s usage at Janssen.

Thanks,
Ajit

1 Like

We did a study on drugs in pregnant women (presented at AMIA, F.Dhombres) and we could definitely use a better logic for detecting pregnancy (delivery time)

Friends:

Are you asking for getting a repo initiated? I can do that. Let me know.

I dont think we should create a repo just for this algorithm. Instead, we
need to think about as a community how we build out an open source
phenotype library, which can be used for cohorts of interest, covariates,
outcomes, etc. Pregnancy episodes fit nicely in our generic cohort
framework.

So I see there’s a WG: http://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:library-wg, but not sure who is attached to it, or how to become involved. Anyone know?

A few thoughts on this community library:
The metadata mentioned on the WG page seems like a good set of characteristics to catalogue each phenotype. Perhaps more broadly, the library would have 2 branches: (1) Atlas-based definitions and (2) custom developed definitions. From a logistics perspective, a phenotype GitHub repo and a wiki would be used to share, collaborate, and inform. A governance process would delineate between clinically and practically viable definitions from those needing further review. Testing of definitions would require multiple (at least 2) sites’ approval before they can become standard definitions. Within a site, a clinical reviewer and a technical reviewer would be identified for approving a phenotype definition. A recurring WG meeting (if this doesn’t already exist) would be established in which new definitions could be teased up and existing ones could be reviewed. Any studies leveraging the definitions would be tagged on the wiki to demonstrate the real world application.

@Ajit_Londhe:

Did we ever get anything loaded up? I am looking for a pregnancy definition.

So the Pregnancy Algorithm is in my repo here: https://github.com/alondhe/PregnancyAlgorithm.

But in terms of where it should go, in THEMIS, we discussed having this be a part of the CDM Builder process to populate condition_occurrence / condition_era, and then use a type concept to qualify that it is derived from an algorithm.

I’m going to work with @bradanton to get this added to the .NET CDM Builder, but for others who are using their own Builder code, not sure. I know Patrick’s vision from the earlier response was to get it into a Phenotype library, so we should land it there when that is available. In the meantime, perhaps it could be a package or set of scripts hosted in the THEMIS WG repo?

Hello, Ajit Londhe! I have seen the Pregnancy Algorithm in your repo.
It inserts the same concepts as “pregnancy episodes and outcomes”: https://github.com/OHDSI/PhenotypeLibrary/blob/master/pregnancy%20episodes%20and%20outcomes/inst/sql/init.sql
and https://github.com/alondhe/PregnancyAlgorithm/blob/master/inst/sql/sql_server/inserts.sql are almost the same.
The “Pregnancy Algorithm” package is based upon “pregnancy episodes and outcomes” package, am I right?
Is it more convenient to run the algorithm from your repo?

And I have another question. We can see 3051 concepts inserted in database schema. And among them there are 21 concept_id inserted more than once (1305058, 40767416, 440795 etc). Could you please explain these issue if you have a chance?
Thank you!

Hi @Maria_ya, my repo is essentially an update to the one @Chris_Knoll developed in the PhenotypeLibrary repo. My repo makes the SQL more OHDSql compliant – meaning that it can run on all supported OHDSI SQL dialects. The original code uses T-SQL loops that aren’t supported by Amazon Redshift. These loops are handled in R instead. Additionally, my version uses bulk loading for the table inserts when using PDW/Redshift, which saves a lot of time.

Aside from that, they are the same algorithm. It’s probably a good idea for my version to move into the PhenotypeLibrary, I’ll add some more documentation before doing this.

Regarding your 2nd question, these concepts map to many outcome candidates. For instance, concept 40767416:

“Was your pregnancy a live birth, stillbirth, miscarriage, abortion, or ectopic pregnancy [PhenX]” maps to multiple outcome possibilities:

40767416 LB Live birth
40767416 DELIV Unknown
40767416 POST Stillborn
40767416 POST Live
40767416 LB Neonatal death
40767416 SB Intra-partum death
40767416 SB Still birth
40767416 SA Miscarriage

The algorithm, using the patient history, then determines which of these outcomes is the correct one before assigning it in the pregnancy_episodes table.

Hope this helps,
Ajit

Agit Londhe, thank you for the answer and explanation!

I have executed Pregnancy Algorithm from you repo now.
But I had to correct something in code for algorithm execution.

We can see in \PregnancyAlgorithm\inst\sql\sql_server\algorithm\step9.sql file the following operations:
dateadd(day, (-1 * p.gest_value) + 1, p.event_date) [please see lines 46, 53, cteGestStartDates table]
dateadd(day, -14 + 1, p.event_date)
dateadd(day, -89, p.event_date)
dateadd(day, -123, p.event_date)
dateadd(day, -56 + 1, p.event_date).

And I can’t understand yet where these numbers came from: 14, 89, 123, 56.
These numbers are missing in the gest_est, outcome_limit, pregnancy_concepts, term_durations tables which should be created in init.
The other question is the error caused by “(-1 * p.gest_value)”. I have removed it from code. PregnancyAlgorithm::execute function execution was successful only after this editing. So I have “dateadd(d, 1, p.event_date))” instead “dateadd(d, (-1 * p.gest_value) + 1, p.event_date)” in these code now but I think it’s not correct.
If it will be convenient for you сould you please help me with these issues?
Thanks,
Maria

Hi @Ajit_Londhe:

Given that all of those pregnancy outcomes map to the same code, how do you handle different outcomes for a multiple pregnancy (e.g., the situation where one twin is a stillbirth and one twin is born live)? How does the algorithm distinguish the entire pregnancy timeframe as ending in stillbirth or livebirth if there are two outcomes which differ based only on the offspring?

Many thanks in advance!

Hi, Mary:

In the case where your single pregnancy episode had multiple outcomes (such as LB and SB), a single pregnancy episode will be recorded, and only one of the outcomes will be associated with the episode.

Maria,
The paper describing the algorithm can be found here:https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0192033, and in the paper you will find the details about why each value was used for each type of outcome (basically, those values are used to offset the pregnancy marker in the data to determine a pregnancy start). Those numbers also used to determine minimum pregnancy episode length and a period of time before a new episode may begin after a prior one ends.

I’m not sure why you would need to alter the code in this way to get it to execute, but that expression is saying to return a value of -1 * the gestational value (for example, 310 days for a live birth). But not every marker has a value for gest_value, so it is expected to see some of the rows inserted in init() to have no value for the gest_value.

But, the change you made isn’t correct for the algorithm. The line: dateadd(d, (-1 * p.gest_value) + 1, p.event_date) means that the resulting date sshould be the event marker (p.event_date) minus the number of days in p.gest_value. Ie, if all you have ais a live birth record, then you would set the start of the pregnancy to the live birth date - 310 days. Your change would make that to make the day of the start of the pregnancy 1 day after the birth!

-Chris

Thank you for the answer, Chris!
Here are two markers (concept_id = 3048230, 3036844) with category ‘GEST’ haven’t value for the gest_value (please see inserts in init.sql and cteGestStartDates table in algorithm.sql). So we can’t implement (-1 * p.gest_value) for the these cases.
And now I have changed “dateadd(d,(-1 * p.gest_value) + 1, p.event_date)” to
“case when p.gest_value is not null then dateadd(d,(-1 * cast(p.gest_value as int)) + 1, p.event_date) else dateadd(d, -1, p.event_date) end” for cteGestStartDates table. Is it right to put “dateadd(d, -1, p.event_date)” in ‘else’ branch?
The pregnancy_episodes result for my data is the same as previous. Outcomes: ‘LB/DELIV’, ‘SA/AB’, ‘SB’, episode_length: from 49 to 295.

Many thanks!

Those 2 concepts are Measurement concepts of ‘gestational age in weeks’, so you won’t have that value in the set of concepts up front, it’s based on the patient record.

I glossed through the algorithm (from the paper, not from the repo, but my own notes on it) and there is part of the algorithm that does this:

select PERSON_ID, Category,  /*convert gestational weeks to days */
case when category='GEST' and value_as_number is not null then 7*value_as_number
           when category='GEST' and gest_value is not null then 7*gest_value 
	  else null end as gest_value, start_date

This part of the algorithm finds those codes with category = ‘GEST’ and returns either the value_as_number from the measurement/observation record or the hard-coded gest value from the algorithm specification. The idea is that one way or another, you will get a gest value to subtract from the marker event date. There is a possible null value in that table (the final else) and so, if there is a case where a GEST category comes back with a null gest_value, I think the right thign to do is drop those pregnancy markers (via a WHERE clause filter) because the purpose of the GEST category markers is to tell you ‘age of gestation’ and if we don’t have the value in that category, the it’s an incomplete record and it should be dropped.

I would not change the dateadd() in the algorithm to sometimes subtract the gest_value and sometimes subtract 1 day. The purpose of the gest_value is to offset the event date to a pregnancy start date, therefore, it’s better to drop the rows that do not have a gest_value in that pregnancy marker event.

But, I would need to do a through review of the algorithm to ensure that didn’t have negative impact. From your summary report, it looks like you did’t ever have a case where the start date was the p.event_date -1 day, so, you are probably finding all the gestational values in your data to produce the episodes.

Thank you for the clarification.

So I have put the following addition in cteGestStartDates code:
where p.CATEGORY = ‘GEST’ and p.gest_value is not null instead
where p.CATEGORY = ‘GEST’
(please see the corresponding line 1059 in your algorithm code https://github.com/OHDSI/PhenotypeLibrary/blob/master/pregnancy%20episodes%20and%20outcomes/inst/sql/algorithm.sql)

Indeed, non-null gest_value looks like pregnancy weeks (the minimum is 12, the maximum is 41). So I put the following correction in 1053 and 1060 lines in your algorithm code:
dateadd(d, -1 * 7 * p.gest_value + 1, p.event_date)
instead
dateadd(d, -1 * p.gest_value + 1, p.event_date)
, am I right?

And it looks like my data doesn’t contain ‘GEST’ concepts cause any changes in cteGestStartDates don’t change the pregnancy_episodes result.

Great thank you for your algorithm!

No, do not do that there, the p.gest_value has been normalized to days back in line 39. By the time it gets into #pregnancy_events here:

select PERSON_ID, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY event_date) AS event_id,
  category, event_date, gest_value
into #pregnancy_events
from #events_all
;

all the gest_value have been converted into days. Don’t multiply by 7 again.

t