Demonstrating whether CTG monitoring does its job
There are (at least) two ways to consider CTG monitoring in labour. One is to think of it as an intervention, aimed at reducing perinatal mortality and morbidity. The best way to test this is with randomised controlled trials. The evidence to date fails to demonstrate that intrapartum CTG monitoring is effective in achieving this goal (read more here, here, and here).
Another way to conceptualise intrapartum CTG monitoring is to regard it as a diagnostic test. The way to establish whether something is a good test or not is known as validity and variability testing. Validity testing aims to establish the following:
- Positive predictive value – when the test is positive, how likely is it that the condition being tested for will be present
- Negative predictive value – when the test is negative, how likely is it that the condition being tested for will not be present
- Sensitivity – when the condition being tested for is present, how likely is it that the test will be positive
- Specificity – when the condition being tested for is not present, how likely is it that the test will be negative
Variability testing usually seeks to find out whether repeating the test on the same person will produce similar results each time. In the case of CTG monitoring, the question is whether different clinicians interpreting the same CTG trace will come to the same conclusion. This is known as inter-observer variability.
In research examining the quality of CTG monitoring as a diagnostic test, the million-dollar question is – what are we trying to diagnose? The goal is not to make a diagnosis of intrapartum stillbirth (which is fortunately rare) or to predict those babies that will die in the early neonatal period (also fortunately rare). The aim is to prevent these outcomes by making a diagnosis when reversible harm to the fetus begins to occur. Measuring non-lethal damage to the fetus is more difficult and it has become accepted that a low arterial cord pH (meaning a high acid level in the blood) be used for such testing. There are many assumptions that lie under this choice, and it is important to keep in mind that an abnormal pH test at birth does not always correlate with a health problem in the baby (Johnson, et al., 2021).
So what’s new?
I have written about research comparing the validity of various intrapartum CTG monitoring guidelines previously (here). Another research team has recently produced new research along the same lines. del Pozo, et al. (2021) assessed the FIGO (European), ACOG (United States of America), NICE (United Kingdom), and Chandraharan (used in some places in the United Kingdom as an alternative to the NICE guideline) guidelines, examining the validity and variability of each. To do so, they asked three reviewers, blinded to the outcome, to evaluate the final 30 minutes of 150 CTG records, by applying each of the guidelines. The research was conducted in Madrid, Spain; and the women being monitored all had singleton, term pregnancies with cephalic presentation. The paper doesn’t indicate what, if any, clinical information about the woman and her pregnancy were provided to the reviewers, nor the profession of the reviewer, who were all said to be “expert” reviewers without defining how this was determined. The ability of each guideline to discriminate between babies born with a pH of above 7.1 and those at or below that level was tested.
The positive predictive value (the percentage of abnormal CTGs where the baby had a low pH) was highest for the ACOG guideline, but still fairly low at 50%. The lowest positive predictive value was seen with the Chandraharan guideline (29.5%). A low positive predictive value potentially drives a higher surgical birth rate for babies who had a normal pH and would therefore not have benefitted from early birth.
The guidelines did better in their negative predictive value, with the best being the Chandraharan guideline at 88.7%. It is important to note that this still means that 11.3% of babies with a normal CTG as defined by this guideline were born with an abnormally low pH. The remaining guidelines all had negative predictive values of 80% or more.
Sensitivity testing showed low numbers except for the Chandraharan guideline. When acidosis was present, between 15.2% (ACOG) and 78.8% of CTGs (Chandraharan) were classified as abnormal. Specificity numbers were higher, ranging from 47% (Chandraharan) to 95.7% (ACOG). That is, when the baby had a pH of more than 7.1, 53% of CTGs were classified as abnormal when the Chandraharan guideline was used, but this only happened for 4.3% of CTGs when the ACOG guideline was used.
Interobserver variability was also assessed in this study. The highest levels of agreement seen were for the baseline heart rate in the FIGO (Fleiss Kappa of 0.53), and ACOG guidelines (0.55), and for the categorisation of the CTG as showing no hypoxia using the Chandraharan guideline (0.56). The lowest levels of agreement were seen in the abnormal CTG categories II and III in the ACOG guideline (0.17 and 0.09 respectively), and for gradually evolving hypoxia compensated or decompensated (both at 0.11) using the Chandraharan guideline. These levels of agreement are similar to those seen in previous studies, confirming the ongoing problem that one clinician’s interpretation that the CTG is normal doesn’t mean that everyone else will see it the same way.
So which one to use?
No one guideline performed universally well across all measures. The authors of the paper preferred the Chandraharan guideline based on the high sensitivity level. This favours babies with a low pH being placed into an abnormal CTG category where clinicians are more likely to consider (appropriately) intervention. Unfortunately, the low specificity level of the Chandrahan guideline also favours unnecessary intervention when the CTG is more often abnormal in the face of a normal pH level. The decision about which guideline to use will continue, I suspect, to relate to the history of specific settings (“we’ve always done it that way”) and professional allegiances (choosing to use the ACOG guideline in the UK is likely to get you referred to the regulator).
It is important that all changes to intrapartum CTG interpretation guidelines be assessed to determine the validity of the algorithm. From this study, and others that proceed it, we now have good evidence about many guidelines. It is noteworthy that here in Australia the RANZCOG guideline has not been assessed in a similar manner, and is it time that someone got around to doing that.
del Pozo, C. Z., Ezquerro, M. C., Mejía, I., de Terán Martínez-Berganza, E. D., Esteban, L. M., Alonso, A. R., Larraz, B. C., García, M. A. & Cornudella, R. S. (2021). Diagnostic capacity and interobserver variability in FIGO, ACOG, NICE and Chandraharan cardiotocographic guidelines to predict neonatal acidemia. Journal of Maternal-Fetal & Neonatal Medicine,10.1080/14767058.2021.1986479
Johnson, G. J., Salmanian, B., Denning, S. G., Belfort, M. A., Sundgren, N. C., & Clark, S. L. (2021). Relationship between umbilical cord gas values and neonatal outcomes: Implications for electronic fetal heart rate monitoring. Obstetrics & Gynecology, 138(3), 366-373. https://doi.org/10.1097/AOG.0000000000004515