What we already know

It is hardly news that one of the problems clinicians face when working with CTG monitoring is that there is a large degree of variation in how the same portion of the recording can be interpreted by different people. This has been reported in research many times before. Here’s a sample:

  • Way back in 1982, Beaulieu and colleagues from Canada showed marked variation in classification of CTG recordings from one person to another. Given 150 tracings to classify, the proportion considered normal ranged from 39% to 74%, and those considered abnormal from 3% to 43%. Only 29% of the tracings were classified the same way by all five obstetricians in this study. Yeah but we got better at guidelines and education since 1982, so we have solved this problem, right?
  • Jump forward to 2014, and across to the Netherlands, and we see Rhöse and colleagues ask nine observers (two midwives and seven obstetricians) to apply the International Federation of Obstetricians and Gynaecologists (FIGO) guidelines to 79 CTG recordings. They did some fancy statistics, using weighted kappa values, and found poor inter-observer agreement. Midwives, however, performed better than obstetricians.
  • The following year in Prague, Hruban et al. asked nine obstetricians to look at 634 CTG recordings. (Exhausting! I hope there were tea breaks!) Overall, there was only 48% agreement between clinicians, with this being slightly higher (51%) when the CTG was “normal”, than when the CTG was “abnormal” (41%).
  • The French did no better in 2015. Sabiani and colleagues asked obstetricians who were registered as obstetric experts who provided medicolegal opinions about CTG monitoring in court. You would think this bunch should know what they are doing when it comes to CTG interpretation. Again, agreement was measured using kappa values, and agreement between experts was poor. It was only marginally better when the same CTG was re-presented to the same observer and they interpreted it for a second time.

What’s new?

Joining this body of evidence, is new research from Amadori and colleagues (2022). This Italian team once again made use of the FIGO guideline, but the updated 2015 version this time. Four midwives and four obstetricians (with a range of seniority among both professional groups) interpreted 73 CTG recordings at two time periods, at least two months apart. No clinical information was provided at the time of the first interpretation, but information about the woman, her pregnancy history, and current clinical status were provided for the second reading. Weighted kappa coefficients were again used.

Midwives had better concordance with other midwives at 77%, than obstetricians did with obstetricians at 66%. There was moderate agreement regarding CTGs classified as “normal” and no consensus on the “suspect” and “pathological” classified CTGs. Adding clinical data improved agreement for the abnormal classes of CTGs but not for those considered normal. Overall, agreement scores were lower at the second time period.

What to make of this

I have two observations to make based on the evidence regarding the lack of consistency in CTG interpretation. First, let’s assume that better agreement between interpreters means they are more likely to correctly identify the fetus who will benefit from expedited birth and distinguish this from the one who is not. If this is the case, then why do we continue the clinical practice of insisting on obstetric review of the CTG when the midwife decides it is abnormal, adding in another individual who might not agree? Midwives appear to do better at this task than obstetricians, so perhaps another midwife should be the final decider in whether the CTG is normal or not. I am not suggesting this as a practice or policy change in the absence of further research however! For the obstetricians reading this, consider turning to the midwife beside you next time you are asked to look at a CTG and ask them for their ideas, then think really hard before you decide you are right and they aren’t.

Second, we should be looking at the CTG as a poor quality test, rather than at clinicians as having poor interpretation skills. If we developed a pregnancy test that was as difficult to interpret as a CTG trace, and with similar positive and negative predictive values, we would chuck it in the bin and start again! We would not invest time, energy, and vast sums of money in teaching people how to use such as useless test. So why do we maintain the ruse that it is clinicians, and not CTG monitoring itself, that is the problem with getting the CTG to work as promised?

A final thought

Consider what poor inter-observer variability means when it comes to central fetal monitoring systems. One of the core beliefs about central fetal monitoring is that having more people see and interpret the trace, the better the perinatal outcomes will be. Given the evidence on inter-observer variability, it is more likely that the more people who see the trace, the wider the range of possible interpretations becomes. The ability to develop a clear plan of management may decrease as people battle it out to decide who reigns supreme. In this situation, the loudest voice, or the most authoritarian figure in the room could be more likely to win. In my experience, that tends to be the consultant obstetrician. Of all the possible participants in this conversation, the consultant obstetrician called in to review the CTG is generally the person who has the least knowledge about the woman and the context in which the CTG recording was generated.

While being relatively remote from the birthing woman, this decision maker is usually in close proximity to the “journey board” – a visible display of all the women in the birthing service at that point in time and their clinical situation. As Newnham and colleagues (2017) have pointed out, journey boards in birth suites tend to shift the focus from an accurate assessment of risk for the birthing woman and her fetus, towards a focus on the risk for the institution. It worries me that increased use of central fetal monitoring, might increase the number of disagreeing observers of a CTG, who are likely to be standing next to the journey board while they consider the CTG. Will this ensure the clinical response to the CTG is strongly influenced by the current demands on the service, and not by what is in the best interests of the woman and her fetus? We desperately need research on the safety of central fetal monitoring systems.


  Brilliant work again Kirsten! Thank you for laying out this conundrum about interpreting fetal surveillance cardiotocography traces so clearly.


  Thank you DrSmall….nothing small about this information!!


