Application and potential of artificial intelligence in neonatal medicine

Neonatal care is becoming increasingly complex with large amounts of rich, routinely recorded physiological, diagnostic and outcome data. Artificial intelligence (AI) has the potential to harness this vast quantity and range of information and become a powerful tool to support clinical decision making, personalised care, precise prognostics


Introduction
Artificial intelligence (AI) is an integral part of our daily lives and has begun penetrating adult healthcare settings with the advent of deep learning techniques and the "big data" environment.A PubMed search of AI papers found 23,442 citations in adult healthcare as compared to just 848 citations for neonatal care, with a significant increase in publications across both groups in the last 3-4 years (Fig. 1).Despite the tremendous potential, AI application in neonatology is still in its infancy.In this review, we explore examples of AI applications, the challenges of AI integration and its potential expansion in neonatal medicine.

Neonatal AI application
The use of AI in neonatal care has huge potential, especially with the increasingly complex intensive care provided for high-risk infants.AI should be viewed as a tool within the healthcare professionals' (HCPs') armoury, alongside blood investigations and imaging, to support shared clinical decision making, provide efficient personalised neonatal care and reduce avoidable errors.

AI application in large neonatal clinical datasets 2.1.1. Prediction of neonatal mortality and morbidity
The establishment of high quality, validated, multi-dimensional neonatal datasets have led to the development of prediction models for neonatal mortality or morbidity using deep learning approaches [1].A good example is preterm survival without bronchopulmonary dysplasia (BPD) [2], an important preterm birth research priority with significant respiratory and neurological problems into adulthood [3].
There were 26 BPD prediction models identified in 2012 [4] with a further 27 identified since then (TCK/DS unpublished data).None are in routine clinical use despite the need for an objective measure to identify high-risk infants for timely targeted preventative treatments such as postnatal corticosteroids.Currently, subjective approaches are used to inform preventative strategies, often based on a few birth characteristics and previous experiences.Uncertainty around who and when to treat may delay treatments and miss narrow therapeutic windows, leading to additional ventilator-induced lung injury (VILI).A dynamic approach, using deep learning to map each infant's trajectory, to provide personalised BPD risk would be invaluable in supporting clinical decision making.This approach has already been used in predicting respiratory failure in cystic fibrosis patients and could be adopted within neonatology if proven to add value and validated [5].

Identifying hidden patterns within data
Deep learning approaches have been deployed in large clinical datasets to identify hidden data patterns.An example of such application is exploring the variation of neonatal nutritional practices and their association with clinical outcomes [6].This could identify "optimal" nutritional practices as well as improve the understanding of the underlying pathophysiology and impact of nutritional practices on neonatal outcomes.

AI application in real-time routinely recorded neonatal intensive care vital signs
Continuous monitoring of vital signs is an essential component of cardiorespiratory care of infants admitted to the neonatal intensive care unit (NICU) [7].The abundance of data, generated by multiple sensors, can offer insight into the infant's clinical status.Pre-set alarm thresholds in bedside monitors can alert HCPs about an acute change in the infant's condition.However, in many cases, they are drawn to the bedside by false-positive alarms leading to 'alarm-fatigue', negatively influencing HCPs' vigilance when immediate action is warranted [8].The application of machine learning (ML) algorithms on monitoring data has been shown to improve the selection of alarms requiring immediate intervention, providing earlier recognition for pre-emptive clinical action, and directing care towards a more efficient, individualized approach [9].Many NICU's are still dependent on infrequent, usually hourly, snapshots of vital signs, often manually registered in the patient's medical record, leaving the majority of abundantly available digital data unutilised [10].Others have already implemented the continuous visualisation of monitoring data into their electronic medical record, enabling HCPs to look for specific patterns in the combined data.This can lead to early recognition of disease states, sometimes hours before they would become clinically apparent [10].ML algorithms can outperform HCPs in fulfilling this task, recognising the disease and supporting clinical decision making [7,[10][11][12].
Late onset sepsis (LOS) results in a significant burden of morbidity and mortality in preterm infants.The timely recognition of disease onset, together with prompt initiation of antibiotic therapy, is crucial to prevent adverse outcomes in these infants.Since LOS is often accompanied by changes in the infant's vital signs [10], it serves as an ideal use case for ML model development.AI algorithms can help recognise LOS before sepsis is clinically apparent [11].The efficiency of these models is often evaluated on a small snapshot of data preceding sepsis diagnosis to identify the disease at a certain timepoint before the clinical sepsis call was made.Although timely sepsis recognition is a key component of the ML model's success, precision remains a major challenge in the model's clinical usefulness.To evaluate how the model interacts with data throughout the infant's entire NICU stay, it should be subjected to continuous performance analysis, evaluating its output at every single timepoint [7].Moreover, ML algorithms should be compared with current clinical tools in diagnostic accuracy, be subjected to internal validation and undergo generalisability testing with external validation [12].
Once proven AI models become available, integrating them into new wireless monitoring devices [13] could potentially lead to a disruptive change in the entire NICU ecosystem and better treatment for infants in terms of comfort, family-centred care and improved outcomes.Linking these data-rich measures from fetal monitoring, care in the delivery room, through to the NICU and early home monitoring, with longer-term outcomes could help advance the understanding of optimal early life care and improve the outcomes of high-risk infants.

AI application in neuroimaging and neurophysiological investigations 2.3.1. Magnetic resonance imaging
The past five years have seen enormous strides in the use of AI for improving value and inference from brain MRI.AI methods have led to technical advances including strategies to mitigate effects of movement artefact and increase information yield [14], and improvements in tissue classification, which is an essential step in many processing pipelines [15].These have enabled researchers to probe the developing brain with ever-increasing depth leading to new insights about the impact of perinatal adversities on structural and functional network architectures [16].In the clinical realm, AI is enabling innovation in 3 key areas: defining neuroanatomic phenotypes, predicting outcome, and facilitating scale-up of imaging studies.
Preterm birth is closely associated with a phenotype that includes atypical brain development, and subsequent intellectual disability, cerebral palsy, autism spectrum disorder, attention deficit hyperactivity disorder, psychiatric disease and problems with language, behaviour and socioemotional functions [17].Structural, diffusion and functional MRI have each provided fundamental insights about alterations to structural and functional networks that are common in infants born preterm [18].However, structure-function relations are better captured by models that integrate data from two or more imaging modalities in a single framework: for instance, multimodal image analysis reveals previously unrecognised patterns of neuroanatomic variants in preterm infants that correlate with clinical exposures and predict cognitive and motor outcomes [19].ML has taken analytic capability to the next level by enabling "fingerprinting" of an individual's brain in morphometric similarity networks (MSNs).MSNs are computed by integrating different types of MRI data in a single model including regional volumes, diffusion tensor metrics, neurite orientation dispersion and density imaging Fig. 1.Increase in the number of PubMed citations over the last 20 years comparing the use of artificial intelligence in neonates and adults using the combined Medical Subject Headings (MESH) terms of "Artificial Intelligence" OR "Machine Learning" AND either "Infant, Newborn" OR "Intensive Care, Neonatal" for neonates or "Adult" OR "Intensive Care" for adult.The search was performed on 15/03/2022.

T.C. Kwok et al.
measures (Fig. 2).This approach can classify robustly the preterm brain phenotype, and it provides an accurate estimate of chronological brain age to within 0.7 ± 0.56 weeks [20] (Fig. 2).It is anticipated this type of approach will be useful for investigating drivers of brain dysmaturation and resilience, understanding the networks that contribute most to atypical brain development, and determining neural bases of cognition and behaviour.
Early prediction of neurodevelopmental disorders is one of the main goals of precision medicine because it would allow for early and more effective intervention, which is expected to lead to improved prognosis.To date, most studies report correlation between a perinatal exposure and a behavioural or cognitive phenotype, with many models unable to take account of other known determinants of brain development such as clinical co-morbidities, social gradients, the stress environment, genetic variants, and maternal health [21].ML has begun to bridge the gap between biological, genetic and environmental variables [22], and it is proving useful for prediction frameworks.For example, a parsimonious ML model identified a robust set of eight out of 24 imaging and clinical features that predicted language impairment in preterm infants.The most important features were white matter microstructure, twin status, and incomplete or no exposure to antenatal corticosteroids.Female sex and breast milk exposure during NICU care reduced the risk of language delay [23].Importantly, although these imaging and outcome prediction models have been developed in the context of preterm birth, they could be applied to many neurodevelopmental disorders.
Finally, some experimental designs require substantial scale-up using multi-site MRI data.These include defining 'normal' population variation in brain structure and function, investigating risk and resilience for brain development conferred by the (epi)genome, biomarker validation, and investigating early life origins of common neurologic and psychiatric diseases.This is challenging because of scanner and scan protocol differences, but ML can be used to enhance data interoperability, for example, by training image processing methods that are more robust to variations in input data.

Continuous electroencephalography
Electroencephalography (EEG) has become an invaluable component of neurocritical care in the NICU.EEG can be challenging to implement largely due to the lack of available experts to review the EEG in real-time and hence efforts have been ongoing for many years to automate this process.
EEG provides real-time information about brain activity and is now considered essential for the diagnosis and effective treatment of seizures in neonates [24].A more simplified but less accurate form of EEG monitoring, the amplitude integrated EEG (aEEG), has been implemented in neonatal units for many years because of the difficulty in obtaining conventional EEG.Given that infants are born at any time, day or night, providing a 24-h neonatal EEG service is challenging for most centres.However, EEG is often needed very soon after birth particularly for infants with hypoxic-ischaemia encephalopathy (HIE) as seizures can emerge within the first 24 h [25].
As neonatal seizures are a neurological emergency, they require prompt treatment.Up to 85% of neonatal seizures may have no obvious clinical signs, particularly in infants with HIE, making recognition very difficult.The only way to recognise and promptly treat all seizures is to use continuous EEG monitoring [26].An additional challenge for  healthcare providers in the NICU is the uncoupling of clinical and EEG seizures following treatment with antiseizure medication [27].This means that any clinical expression of the neonatal seizure that may have been present before treatment becomes invisible after treatment, yet the electrical seizure discharge continues.EEG monitoring is therefore essential to assess the efficacy of treatments.
While EEG is hugely beneficial for neonatal seizure monitoring and treatment, it is also very useful in helping support the diagnosis of neonatal encephalopathy, particularly HIE.The majority of AI research using neonatal EEG has emerged in the area of automated seizure detection and more recent algorithms have used deep learning.The seizure detection accuracy of these algorithms is impressive and comparable to that of human experts [28].The report of the first multicentre trial of a neonatal seizure detection algorithm [29] demonstrates that an AI algorithm can be implemented in the NICU, is acceptable to staff and most importantly, given algorithms are 'always on', more seizures were detected in real-time.Most of the seizure detection algorithms developed to date have been for full terms neonates, but some for preterm infants are now emerging [30].The bottleneck in the development of these algorithms is always the availability of sufficient data for testing, training and validation.In addition, all seizures in the datasets must be labelled by several experts before they can be used for testing and training.
Studies are also underway to develop algorithms, many using deep learning methods, that can assess brain maturation, estimate sleep states [31] and grade the background EEG patterns in conditions such as HIE [32] (Fig. 3).Appropriate and sufficient data that has been expertly labelled is key to this effort.Physiological data like EEG is noisy and subject to both movement and biological artefact, which must also be considered during testing and training.As a result, a multidisciplinary approach is critical to its success.The recently funded European Cooperation in Science and Technology (EU COST) AI4NICU Action (https ://www.cost.eu/actions/CA20124/#tabs|Name:Working%20Groups)aims to speed up the development of AI technologies that detect brain injuries in neonates through such a multidisciplinary collaboration.

AI application in image recognition
Deep learning techniques such as convolutional neural networks (CNNs) and computer vision approaches can address the challenges of the potentially small and skewed datasets that often characterise neonatal image recognition tasks.These rely on a two-stage technique: automated segmentation of image or scene of interest followed by identification of the outcome of interest.Examples include the estimation of the gestational age of infants at birth using images of newborn infants [33]; analysing videos of clinical procedures, such as newborn resuscitation [34], and assessment of pain [35].
Gestational age at birth often guides treatment delivered by HCPs.Early dating prenatal ultrasound scan is the gold standard for assessing the gestational age of an infant [33].However, this may not always be available particularly in low-resource settings.A novel automated postnatal gestational age estimation tool [36] used images of a newborn infants' face, ear and foot.The images were initially passed through a segmentation stage using fully convolutional networks (FCNs) (Fig. 4).The system was taught how to identify the body part of interest from the image and disregard the background.FCNs create binary masks, labelling the pixels of the body part of interest, and then comparing these to masks created when the images were manually annotated.Following segmentation to identify the region of interest, the images were passed through CNNs to learn features and classify the images into gestational classes for each body part.Following training, test images are passed through the CNNs and assigned a probability vector for each body part.The probability vectors are then combined with the newborn's weight and regression performed, outputting an estimate of the gestational age in days rather than a broad gestational class.The tool estimated gestational age to within approximately six days, surpassing that of postnatal clinical examination, including the Ballard score and last menstrual period [37].Integration of the algorithms into smartphones could allow rapid estimation of gestational age, based on photos, in settings without other antenatal gestation dating resources.Similar approaches could be adapted to support the diagnosis of neonatal conditions including syndromes especially when linked with other data [38].
During newborn resuscitation, HCPs adhere to a defined protocol consisting of a sequence of actions by several people.Due to the highpressure environment during resuscitation, errors can be made.Smith et al. [34] developed an automated scene segmentation and action recognition tool using deep learning to analyse videos of newborn resuscitation with good performance from as little as 20 training images.The tool would not only be useful in the training of HCPs by providing real-time feedback, but also has the potential to develop early warning systems used in resuscitation settings when HCPs deviate from standard protocols.

AI application in predicting response to neonatal treatment
Premature infants are commonly diagnosed with respiratory distress syndrome, requiring intubation and mechanical ventilation (MV).MV of preterm infants presents several challenges, including specific oxygenation targets and minimising ventilator-induced lung injury (VILI).Personalised treatment requires rapid and frequent interventions based Fig. 3. Overview of an AI system to grade the neonatal EEG.The system assigns a grade (1-4) based on 1 h of multi-channel EEG.Pre-processing: the EEG is filtered and downsampled to 64 Hz, then divided into 5-min segments.System model: each 5-min segment, per channel, is converted to a quadratic time-frequency distribution (TFD).This 2-dimensional image is passed through a pre-trained convolutional neural network (CNN).The CNN has parallel layers to extract information across time, frequency, and in the joint time-frequency direction of the TFD.The CNN produces a separate probability for each of the 4 grades.Post-processing: the probabilities from the CNN are combined over all 5-min segments and across channels to give a final grade prediction.Adapted from Raurale et al.J Neural Eng [32].on changes in the patient's state that are often not achievable within current NICU constraints.For example, oxygen saturation targets in mechanically ventilated neonates were achieved only 40% of the time [39], something AI techniques could potentially address.
Minimising VILI is one of the key challenges in neonatal care to reduce the risk of associated adverse outcomes contributing to morbidity, mortality, and poor long term quality of life.Weaning from MV remains a complex clinical problem in the NICU, with 15%-40% of infants failing extubation [40].AI techniques have been employed to help design decision support tools to predict extubation readiness and neonatal outcomes [41,42].Precup et al. [41] suggested a predictor model, based on support vector machines, could reduce the extubation failure rate by more than 80% as compared to current clinical measures.The performance of predictive models of extubation outcome based on different ML algorithms have been studied [42].Although some models showed satisfactory performance, HCPs' predictions still outperformed all developed models, indicating the need for further refinement of detailed mechanistic models that can be fully validated against individual patient data.
AI techniques can be exploited for optimising the administration of drug doses.Prediction models have used optimised support vector machines, decision tree ensembles, and deep learning to predict the effectiveness of therapeutic caffeine regimens in preventing apnoeas and reducing the need for prolonged MV [43].
Artificial neural networks have been designed to suggest the most protective ventilator settings and minimise VILI, maintaining blood gases within acceptable ranges demonstrating advantages over rulebased systems [44].Recurrent neural networks for predicting future ventilation parameters have demonstrated good accuracy when predicting 1.5 s ahead [45], but performed poorly when attempting to predict further into the future, again indicating the need for more refined models with higher predictive capabilities [46].

Challenges of AI in neonatology
Alongside recognising the far-reaching AI potential in neonatal medicine, it is also crucial to understand its limitations and pitfalls if we are to integrate it within routine care pathways.Currently, there is a lack of uniformity in critically appraising the development and performance assessment of AI healthcare tools in the following areas [47,48].

Dataset quality
High-quality data is needed to train AI tools.The datasets used to develop AI tools should be reported fully by researchers and appraised carefully by reviewers [49].Some of the common pitfalls are small sample sizes, inappropriate handling of missing values and heterogeneity assessment in different population subsets or healthcare settings [47].Caution is also needed to look for biases against underserved groups that may have been unintentionally embedded within the developed AI tool.

Model performance assessment based on the dataset type
Cross-validation and bootstrapping are commonly used in assessing model performance to utilise the full range of data collected.Depending on the aim of the AI tool, performance should also be continuously assessed on the full dataset throughout the neonatal admission, rather than on a snapshot of the data.
The most reported performance measure of AI tools is the area under the receiver operating characteristics curve (AUROC), a discrimination measure.However, this can be misleading in certain settings.In some healthcare timeseries datasets with class imbalance, AUROC may provide false assurance of the model performance as AUROC depends on the true negative rate which is high in imbalanced datasets.Hence, other performance measures must be assessed including precision-recall curve, specificity/sensitivity and calibration measures [48].

External validation and clinical impact
Overfitting is a common issue plaguing AI prediction tools.Complex ML algorithms may be very sensitive in detecting nuances within the dataset, producing excellent performance in the initial dataset used to develop the algorithm.However, because the tool models the training data too well, its performance may suffer when tested on a different dataset or in clinical practice, making external validation essential [47].
Clinical impact studies are also required to provide robust evidence to inform AI application in healthcare settings.These assess the performance of AI tools in terms of their discrimination and calibration characteristics, and their impact on the clinical workflow (e.g.changes in the behaviour of HCPs or parents/carers) and patient outcomes.How the AI tool integrates seamlessly within the current clinical workflow, providing the right information to HCPs and parents/carers, is crucial to bridge the development-implementation gap to clinical practice [49].

Interpretability
Interpretability and transparency are crucial to achieving augmented intelligence.There have been increasing innovative efforts in exploring approaches to improve AI's interpretability [50] and explaining how the prediction is derived.Interpretability should be inherently considered in model construction for bedside clinical use, guiding the choice of algorithm to train a model.

Critical appraisal, regulatory and monitoring guidance
There has been an enormous effort in updating current critical appraisal, regulatory and monitoring guidance for AI healthcare devices [48,49].When developed, these will help address some of the methodological, critical appraisal and medicolegal challenges as well as the monitoring needed to ensure its safe efficient use for the intended purpose.

Future
AI is likely to become an indispensable part of the neonatal care toolkit to support HCPs and parents/carers in providing improved, efficient and safer neonatal care.For this to become a reality, two crucial steps need to be taken.Firstly, the digital literacy among HCPs in understanding AI's principles and limitations needs to be improved.This enables HCPs to appraise newly developed AI tools and monitor their safety and appropriate use in clinical practice.Secondly, there is a need for cross-disciplinary, international collaborations that includes data and computer scientists, HCPs, lawyers and policymakers to design and apply AI tools that will overcome the challenges highlighted.

Conclusion
AI will be an integral part of the data-rich environment of neonatal care and this review highlights important areas of its application under investigation.These include mortality and disease prediction, image analysis and clinical decision support tools.However, current AI application is lagging behind adult specialities and a concerted effort is needed to accelerate neonatal AI research and translation into meaningful clinical application.

Practice points
• AI will be an integral part of future healthcare, especially in complex settings such as neonatal intensive care, to support shared clinical decision making, providing efficient personalised neonatal care while reducing avoidable errors.• Novel application of AI in neonatal medicine is already improving our understanding of neuroanatomic phenotypes, prediction and disease modelling, seizure analysis and optimal ventilatory strategies.• Addressing the challenges, improving digital literacy amongst HCPs and multi-disciplinary collaborations, are needed to harness the full AI potential in neonatal care.

Fig. 2 .
Fig. 2. a) Individual morphometric similarity networks (MSN) construction: Different metrics are extracted from structural and diffusion MRI data (sMRI, dMRI (such as diffusion kurtosis imaging (DKI) and Neurite Orientation Dispersion and Density Imaging (NODDI) models)).The same labelled atlas is applied to all image types and the average metric values are computed for 81 regions of interest.An MSN (represented here as a connectivity matrix) is built by computing the Pearson correlation between the vectors of metrics of each pair of ROIs.b) Training of a predictive model from individual MSNs: the inter-regional correlations are used as predictor variables in a machine learning model for chronological brain age.The performance of the model is evaluated on an independent test set.From Galdi P et al.Neuroimage Clin [20].

Fig. 4 .
Fig. 4. a) Plantar surface photograph of preterm and post-term infant.Manually labelled region of interest (ROI) used to train system to identify foot and remove background with subsequent plantar crease pattern recognition.This uses fully Convolutional Neural Networks (CNNs) to segment images and provide per-pixel classification.b) Images undergo deep machine learning, with CNNs passing through different layers, to be classified into one of five classes (extremely preterm, very preterm, moderate preterm, term and post-term).This probability vector is then combined with the weight of the infant to improve the decision-making process during regression and the output is an estimate of gestational age in weeks.Adapted from Torres et al.Image and Vision Computing [36].