Remember to follow me on Social Media!
When I first dove into the project of predicting Sudden Infant Death Syndrome (SIDS), I had no idea how deeply personal it would become. Our capstone began in January 2024, the last full-time semester of my master’s at Rice University. At the same time, my wife and I were preparing to welcome our first child — a baby girl, due right around mid-April, just as I was wrapping up the semester and transitioning from full-time student to full-time data scientist at my lab. The timing was intentional. We had planned it carefully: I’d finish my graduate coursework, deliver my final presentations, and step into fatherhood and a full-time role with no academic weight holding me back.
But life, like machine learning models trained on noisy real-world data, rarely behaves predictably. Our daughter was born prematurely, weeks before we expected. As I worked through the most technically and emotionally intense project of my academic career (building an AI pipeline to predict sudden death in vulnerable infants), I was also navigating neonatal care in real life, watching over a tiny human who had arrived before she was ready. Each day, I would bounce between debugging code for heart rate variability analysis and learning how to feed, swaddle, and protect a medically fragile newborn.
That convergence made this project real. Not abstract. Not academic. It was personal. I wasn’t just thinking about AI for hypothetical babies in the NICU. I was looking at mine, yearning to hold her in my arms. And that experience sharpened my sense of responsibility in ways I can’t quite put into words.
This blog post explores how our capstone project (Team Breath of Life at Rice University) harnessed AI to predict cardio-respiratory failure in SIDS. More specifically, this post recounts our journey in building a machine-learning pipeline to identify cardio-respiratory signatures of SIDS in a mouse model. It’s a story about data, yes — but also about timing, leadership, life, and a different kind of early warning. Along the way, I’ll share how we engineered features that capture a heartbeat’s shape, trained neural networks on spectrogram “heatmaps,” and tried to help machines learn what parents and doctors struggle to catch in time.
SIDS has haunted new parents and doctors for generations. Coined in the 1970s, the term describes a diagnosis of exclusion where an infant suddenly dies with no apparent cause. Most cases happen during sleep, often when a baby is found face-down and believed to have re-breathed CO₂-rich, oxygen-poor air. In a healthy infant, low oxygen would trigger an auto-resuscitation reflex: the brainstem sounds an internal alarm, making the baby gasp and “reset” breathing and heart rate. In SIDS, for reasons still unknown, this protective reflex fails, and the infant simply doesn’t wake up.
Researchers have uncovered risk factors and theories. Babies sleeping in unsafe positions or exposed to tobacco smoke during pregnancy have a higher risk. Some hypotheses focus on subtle abnormalities in the brainstem or the autonomic nervous system that impair breathing control. Others point to genetic predispositions or cardiac arrhythmias. However, an exact cause remains elusive since it’s likely multi-factorial. What’s clear is that we cannot yet predict which infant will succumb to SIDS. This unpredictability is what makes SIDS so terrifying and motivates work on early detection.
Each year in the United States, over 1,300 infants die from SIDS. This number has plateaued despite public health campaigns about safe sleep. It would be invaluable if we could identify high-risk infants through non-invasive monitoring (say, analyzing an infant’s heartbeat or breathing patterns). This is the vision of predictive healthcare in this context: rather than just reacting to an emergency, we’d love an AI “smoke alarm” that alerts caregivers before a baby’s life is in danger. Recent research is starting to move in this direction. For instance, a 2024 study identified metabolic biomarkers in blood that might help flag infants at risk of SIDS. And in neonatal intensive care units (NICUs), machine learning has been used on vital signs like heart rate variability to warn of impending sepsis infections hours in advance. These efforts all strive toward proactive, preventative monitoring as a theme at the heart of our project.
One big challenge in SIDS research is that you obviously cannot experiment on human infants. To study SIDS triggers in a controlled way, our collaborators at Baylor’s Ray Lab turned to a mouse model. They created a simulation of SIDS in the lab using newborn mouse pups. Each mouse pup was placed in a chamber and fitted with tiny ECG leads to record its heart activity, plus a custom face mask. Through the mask, the system periodically gave the pup a gas mixture with no oxygen (typically a nitrogen/carbon dioxide mix) to challenge its breathing. This low-oxygen, high-CO₂ environment would cause the pup’s heart rate to drop and its breathing to cease, essentially inducing an apnea event as seen in SIDS. The moment the pup stopped breathing, the system returned normal air to the chamber, allowing it to gasp and recover if it could (this is the auto-resuscitation reflex in action). The cycle would repeat: challenge the pup with low oxygen, then rescue it, until, eventually, in some unfortunate cases, the pup’s reflex failed, and it did not recover. In other words, the mouse succumbed to a SIDS-like event.
Over several years, the Ray Lab conducted hundreds of these automated experiments on mouse pups. Each experiment produced a rich collection of data: continuous ECG waveforms (electrocardiogram signals) tracking the heart’s electrical activity, respiratory signals (from a small airflow sensor in the mask), and timestamps of each apnea episode and outcome (recovery or death). The lab’s custom software, Breathe Easy, processed the raw waveforms into a set of numeric features for each mouse. For example, it extracted each “breath” and measured durations, and similarly recorded each heartbeat’s timing. By the time we, the Rice capstone team, received the data, we had a database of 357 mice with about 3,584 apnea events in total, each event labeled as either a normal apnea (the mouse recovered) or a fatal apnea (the mouse died, e.g., SIDS in this model).
As the only graduate student on the team (and the one with a biomedical signals background), I took the lead on transforming this trove of experimental data into something we could feed into machine learning models. Our goal was straightforward to state: for each apnea event, use the preceding ECG and breathing data to predict whether it will be fatal or not. If our AI could learn the difference between a “recoverable” apnea and a pre-SIDS apnea, it might reveal the hidden warning signs that life itself failed to notice.
I began by digging into the feature lists from Breathe Easy. These included 177 parameters per apnea event, such as the apnea duration, the breathing rate leading up to it, the average heart rate, etc. First, we tried classical statistical analysis: were any of these 177 features significantly different in SIDS-versus-normal events? We applied ANOVA tests and Mann-Whitney U tests to compare groups. The result: no single feature popped out as a clear discriminator. This was deflating at first since we didn’t find any simple “smoking gun”, like “heart rate drops by X% only in the SIDS cases.”
We also trained a quick random forest classifier using a subset of these features that our project sponsors (the Baylor doctors) suspected might be important. This was essentially our baseline machine-learning attempt on the raw feature set. Initially, the random forest just predicted the majority class every time (e.g., “no SIDS event”), which isn’t surprising since there were far more recoveries than deaths (a heavy class imbalance). We then adjusted the model’s class weights to penalize SIDS errors more, hoping to coax it into catching the rare SIDS case. The outcome was still poor: the model would achieve a high overall accuracy (because most events are non-fatal, and it got those right), but it rarely actually predicted a SIDS event. In fact, its recall for SIDS was near zero (around 8.4% by one analysis). In other words, it might catch 1 out of 12 actual SIDS cases, missing all the rest, which isn't a proper early warning system at all. This taught us our first lesson: a model can appear “accurate” with imbalanced healthcare data while completely failing the minority class of critical interest. SIDS events were needles in a haystack, and the haystack was winning.
Frustrated by the flatness of the raw features, I decided to engineer new features that might tease out hidden patterns. This is where domain knowledge in signal processing became invaluable; we know from physiology that subtle changes in heartbeats can indicate distress. For example, researchers have long studied heart rate variability (HRV) in infants, and some works suggested that infants who later succumb to SIDS have “unique” variability patterns in their heart rate. Also, the shapes of ECG waveforms (the QRS complexes) could hold clues about cardiac function. So, we expanded our feature set in a few key ways:
Heartbeat Morphology: We broke down each QRS complex (the spike in the ECG for each heartbeat) into its constituent waves: Q, R, and S peaks. We computed its amplitude (height in mV) for each wave and how that amplitude changed compared to the previous heartbeat. Subtle shifts in these amplitudes might reveal deterioration in cardiac output. We also measured timing intervals within each heartbeat: the time between Q and R, R and S, and Q and S. These within-complex intervals, along with the between complex interval (the gap between successive R-waves, e.g., the R–R interval), quantify how regular or irregular the heartbeat timing is. Notably, R–R interval variability is a classic HRV measure, which is important in infant cardiorespiratory health. Finally, inspired by signal-processing research, we calculated the area under the QRS curve for each beat as a proxy for morphological shape. Prior studies have used similar area-based metrics to classify abnormal heartbeats like arrhythmias, so it was a promising feature to include.
Heart Rate Dynamics: Instead of just looking at the instantaneous heart rate or average, we examined how the heart rate fluctuated over time leading up to an apnea. I implemented a Katz Fractal Dimension (KFD) calculation on the heart rate time series. This gave us a single number representing the complexity of the heart rate signal, where a higher fractal dimension means a more erratic, less predictable pattern. The intuition is that a distressed or unstable physiological state might produce more complex, chaotic heart rate fluctuations. (Think of a calm, healthy baby’s heart versus one that’s struggling; one might expect different patterns of variability.) We also considered entropy-based metrics (like fuzzy entropy) to quantify signal irregularity since these have been used in biomedical signal analysis to capture complexity in noisy data. These metrics complement traditional HRV measures by highlighting nonlinear and non-obvious patterns.
Respiratory Features: The breathing waveform, too, held potential clues. The original 177 features (the “breathlist”) came mainly from the respiration signal, but since none were individually predictive, we brainstormed combinations or transformations. For instance, we looked at sequences of breaths prior to the apnea: was there a progressive slowing of breathing? An increase in variability of breath amplitude? One idea from the literature was to examine respiratory entropy or complexity, similar to the ECG approach, since irregular breathing patterns might precede a failure. We also derived features capturing the interaction between heart and breathing signals; for example, the coupling between heart rate and respiration (a phenomenon known as respiratory sinus arrhythmia). These are more exploratory, but the goal was to arm our models with as detailed a description of the pre-apnea state as possible, since we didn’t know what subtle combination might be the harbinger of collapse.
All told, by the end of feature engineering, we had 200+ features for each apnea event, spanning classic vital sign stats to exotic nonlinear metrics. We were throwing the kitchen sink at the problem, but in a principled way grounded in cardio-respiratory physiology. I often toggled between two mindsets: the data scientist tweaking code to calculate features and the biomedical engineer asking, “Does this feature make sense for what an infant’s body is doing?”. This dual approach was crucial; it’s a lesson I carry forward: in healthcare AI, blending domain insight with data-driven methods amplifies the power of both.
I also performed visual exploratory data analysis while computing these features. One technique that made a significant impression on our team was plotting spectrograms of the ECG signals. A spectrogram turns a time series signal into an image, showing how the signal's frequency content changes over time (via Short-Time Fourier Transform, or STFT for short). I took ECG data from each mouse and generated spectrogram images for the 5-minute window leading up to an apnea event. In short, the x-axis was time, the y-axis was frequency (0.5–50 Hz band of interest), and pixel intensity showed signal power at each frequency. Then, I compared spectrograms for two scenarios: before a normal apnea (mouse survived) vs. before a fatal apnea (mouse died). The difference was striking: the ECG spectrograms before fatal apneas looked smoother and more regular, while those before recovery apneas looked choppier and noisier. Figure 7 in our report captured this pattern clearly, where the “at-risk” ECG had a kind of eerie calm, while the recovery-bound ECG was more erratic. This was a big clue. It's suggested that whatever breakdown leads to SIDS might manifest as a loss of variability or loss of complexity in the heart signal shortly before the event. In other words, the spectrogram visualized some of the same phenomena we hoped our features, like fractal dimension, would quantify.
This discovery shaped our strategy. If the difference is visible to the eye in a spectrogram, then a computer vision model could also learn it. We decided to pursue two parallel modeling approaches from here on: one leveraging our carefully engineered numerical features (many of which tried to capture the kind of variability differences we saw) and another leveraging the raw spectrogram images directly with deep learning. It was time to move from data exploration to building predictive models.
In modern AI, there’s a bit of a dichotomy: feed the algorithm carefully curated features, or feed it raw data and let it figure out the features itself. We decided to try both.
1. Feature-Based Time-Series Model (1D CNN + LSTM): I built a neural network that could process a sequence of feature vectors over time for the tabular dataset of engineered features. Each apnea event wasn’t just a single timestamp – we had a time series leading up to it (e.g., 10 seconds or 5 minutes of data, depending on the window, which we could break into smaller sub-intervals). To model temporal patterns in the features (like trends or oscillations in heart rate), we used a hybrid architecture: a 1D Convolutional Neural Network followed by a Long Short-Term Memory (LSTM) network. The 1D CNN component acted as an automated feature extractor across the sequence (for example, spotting a spike or drop in some metric), and the LSTM gave the model a “memory” to connect patterns across time. Essentially, this network looks at a rolling window of our features and learns an internal representation of the cardio-respiratory dynamics before predicting SIDS vs non-SIDS at the end. We trained this model on all our events, taking care to do proper cross-validation given the limited data. (With only ~3.5k events, we had to be wary of overfitting; I used techniques like dropout regularization and limited the network size accordingly.)
2. Image-Based Deep Learning Model (2D CNN on Spectrograms): In parallel, we treated each event’s ECG (and potentially breathing) spectrogram as an image and trained a convolutional neural network to classify it. If a human can glance at the spectrogram and notice “smoother vs noisier” patterns, a CNN should also be able to learn the nuanced differences. We used a straightforward 2D CNN architecture (inspired by standard image classifiers with convolutional layers, pooling, etc., ending in a softmax for two classes). Here, each training example was a spectrogram image labeled “SIDS” or “non-SIDS.” To augment the data (since 3,584 images is not a lot by deep learning standards), we did some basic augmentation like flipping or adding slight jitter to the images, and we were careful to filter frequencies to the same range for all (0.5–50 Hz) to reduce noise. The CNN’s job was to implicitly learn features like “lack of high-frequency variation” or “presence of certain oscillatory patterns” that might correlate with impending SIDS. This approach essentially lets the data speak with a less human preconception since the network might discover a pattern we didn’t hypothesize.
Both approaches were implemented in Python (PyTorch for the CNN models), and we managed our experiments in the project’s GitHub repository. As the technical lead, I wrote most of this modeling code and spent long nights tuning hyperparameters, but team collaboration was key in interpreting results and deciding the next steps. For example, when one model would make an incorrect prediction, we’d examine that case together to see if there was a biomedical reason or just noise.
After training and testing, we had a tale of two models. The feature-based CNN+LSTM achieved an overall accuracy of around 91%, which was initially a cause for celebration. Nevertheless, my heart sank when I dug into the confusion matrix: the model had effectively learned to predict “no SIDS” for every case. It wasn’t truly 91% accuracy because it was smart; it was just taking advantage of the class imbalance. It failed to catch any of the SIDS events. In other words, it had zero sensitivity (recall) for the positive class. We had seen this pattern before with the random forest; now, even a more complex model fell into the trap. Despite our sophisticated features, the network likely found it safer (in terms of minimizing loss) to ignore the rare positives and focus on getting the majority right. This underscores how tricky imbalanced medical data can be. We did try techniques to mitigate this (e.g., class weight adjustments in the loss function, resampling), but the bottom line was that this model was not viable if it wouldn’t raise the alarm for an actual SIDS event. A 91% accuracy is meaningless if the 9% it gets wrong are the only cases you care about.
On the other hand, the spectrogram CNN told a more promising story. Its accuracy was lower (about 86% on the test set), but importantly, it did catch some SIDS cases. In fact, its recall (sensitivity) for SIDS was ~0.45 (45%). It wasn’t anywhere near perfect, but this was an ample improvement from 0%. The precision (positive predictive value) was about 50.6%, meaning roughly half of the events flagged as “SIDS likely” were actual SIDS cases. To put it plainly, the CNN would correctly identify ~45% of impending SIDS events before they happened, at the cost of some false alarms (for every event it got right, it also misidentified one that turned out fine). This trade-off (more false positives but catching some true positives) is precisely what we wanted in this context. Given the life-or-death nature of SIDS, it is far better to have an alarm that cries wolf occasionally than one that sleeps through the real danger. We prioritized sensitivity over specificity. In clinical terms, missing a SIDS event (a false negative) is the worst outcome; a false positive alarm that wakes the parent unnecessarily is a nuisance but not a tragedy.
The confusion matrix for the CNN reflected this balance: it had a meaningful number of true positives (SIDS predicted correctly) and some false positives, whereas the feature model had essentially no true positives. This result validated our hypothesis that the frequency-domain patterns carried predictive signals that the human-crafted features weren’t fully capturing. The CNN learned to notice something in the spectrogram, perhaps a loss of high-frequency heart rate variability or a particular respiratory oscillation, that correlates with failure to auto-resuscitate. It’s intriguing to think what exactly it’s keying in on. Is it essentially measuring heart rate variability in its own convolutional way? Is it picking up on a slow drift or electrical stability in the heart prior to failure? Deep learning is often criticized as a “black box.” However, we plan to apply interpretation techniques (like saliency maps on the spectrograms) in the future to reverse-engineer the features that the CNN found important.
In AI for healthcare, the success metric depends on context. Our two models demonstrated this vividly. If you only looked at accuracy, you’d pick the 91% model and be utterly wrong in practice. You must consider what matters: here it was catching that one critical event. In fact, our final recommendation was to favor the spectrogram CNN despite its lower accuracy because it was more likely to predict SIDS when it was genuinely going to occur. This lesson aligns with a broader trend in medical AI: you tune the system not for vanity metrics but for the outcome that saves lives. Sometimes, that means accepting more false alarms. A parallel can be found in NICUs with sepsis alarms built from algorithms that monitor infants’ vital signs dramatically reduce neonatal sepsis mortality by alerting staff earlier, even though they aren’t perfectly accurate. We envisioned our SIDS predictor in a similar light: a somewhat noisy alarm is far better than silence.
We did face the limitation of small data. With only a few thousand training examples and a deep CNN with ~24 million parameters, we knew overfitting was a risk. In fact, by the rule of thumb, one often wants 10× more data points than parameters for robust learning, which is an impossible 240 million data points in our case. We mitigated this by regularization and leveraging domain constraints (frequency filtering, limited input window, etc.), but scaling up the dataset size is essential. One lesson here is that data is often the scarcest resource in biomedical projects. We were fortunate to have any SIDS examples at all (since it’s hard to acquire). However, future studies will need to gather more, possibly by running more mouse experiments or by collecting analogous physiological data from human infants who had apparent life-threatening events (ALTEs) or non-fatal apneas.
On a personal note, this project was as much about people as it was about technology. As the technical lead and sole grad student on a team of undergraduates, I wore many hats: project manager, data engineer, machine learning scientist, and occasional domain translator. I guided the team through brainstorming features, taught newer members about concepts like Fourier transforms and neural networks, and kept us on a rigorous schedule to meet deliverables. One day, I’d be debugging Python code to fix an ECG signal preprocessing bug; the next day, I’d present our latest findings to pediatricians and biologists, explaining what a “spectrogram CNN” is in lay terms. This taught me the importance of communication across disciplines. We had brilliant domain experts at Baylor providing us with context on the biology of SIDS, and it was my job to ensure we correctly translated their knowledge into our data features and that we translated our results back into meaningful biomedical insights.
A concrete example of this synergy was when the doctors suggested certain features (like specific breathing characteristics) based on their experience. Even though our initial inclusion of the entire breathlist didn’t yield a silver bullet, those conversations sparked ideas for feature engineering (such as looking at variability in breathing patterns). In turn, when our models started working, I created visualizations of what the model was seeing, like overlaying the “smoother” versus “noisier” ECG signals, to discuss with the doctors. That moment when a neonatologist nods and says, “Interesting, that makes sense physiologically,” you feel the gap between AI and medicine narrow just a little.
From a leadership perspective, I learned how to balance innovation with pragmatism. It’s easy to get excited about fancy deep learning models (guilty as charged), but part of my role was ensuring we also tried simpler approaches and didn’t overlook simpler insights. We did, for instance, spend time on classical statistical tests and a baseline random forest, which isn't glamorous, but it established a point of comparison and justified the need for more complex modeling when those failed. Since multiple people were contributing, I also helped manage version control and coding standards in our GitHub repo. Enforcing good practices early (like clear documentation and unit tests for our data preprocessing functions) saved us from chaos later. These multifaceted skills are the sorts of project management lessons that one doesn’t always learn in class but are critical in real-world tech teams.
Finally, being the lead on a high-stakes healthcare project impressed upon me the ethical responsibility we carry. We often discussed what a “prediction” really means – if our model alerts a SIDS risk, how confident can we be? What should the response be, wake the baby? Take it to a hospital? False positives have consequences (parent anxiety, over-treatment), but false negatives have the worst consequence of all. This prediction isn’t just an academic exercise; it’s potentially life and death. That kept us intellectually honest and cautious in how we presented our findings. We were careful not to over-hype the results and emphasized that more validation is needed before anyone trusts an AI with a baby’s life. I think this mindset is crucial for anyone looking to bring AI into healthcare: accuracy metrics are just the start of the conversation, not the end.
Our capstone project was a prototype and a proof of concept that AI can pick up patterns in physiological signals that might precede SIDS. But, turning this into a real-world diagnostic tool will require further research and development. Here’s where we see it heading:
Real-Time Monitoring: The next step (already in progress with our collaborators) is to test the spectrogram CNN on real-time streaming data. In the mouse lab, this means running the algorithm during live experiments to see if it can alert the experimenters that a pup is about to succumb, potentially even allowing an intervention. In human infants, real-time means analyzing data from monitors on the fly in the crib or NICU. This interactability poses challenges: our model would need to be efficient and robust to noise and motion artifacts, and it would have to decide continuously when to raise the alarm. We may need to combine short-term and long-term analyses (for instance, checking a 5-minute window every 30 seconds, etc.). The code we wrote is refactored for low-latency inference to enable this streaming prediction.
Integrated Multi-Modal Signals: Our project focused on ECG and respiration data, but infants in NICUs often have other monitors like pulse oximetry (oxygen levels), temperature, and even video. A future system might integrate multiple data sources for a more confident prediction. Interestingly, the recent study identifying metabolic biomarkers for SIDS hints that researchers can combine biochemical signals (like specific proteins or metabolites in the blood) with physiological signals. Perhaps an AI system could one day synthesize genetic risk factors, metabolic indicators, and real-time vital signs into a single risk score for SIDS. This platform is still speculative, but it aligns with the direction of personalized, predictive medicine.
Better Algorithms for Imbalanced Data: We got a crash course in dealing with class imbalance, but there’s room for more advanced techniques. Future models could employ anomaly detection framing by treating SIDS events as anomalies to detect rather than as one-half of a balanced classification. There’s promising research on training models to learn what “normal” looks like and then flagging deviations. In fact, one could imagine training on the abundant non-SIDS data (apneas where recovery occurred) to establish a baseline and using unsupervised or semi-supervised methods to spot the oddball cases. Recent advances in time-series anomaly detection (2023–2025) could be applicable here, ensuring the model doesn’t just learn to be a majority vote predictor.
Scaling and Generalization: Ultimately, to have confidence in such a system, we need to test it on larger and more diverse datasets. That could mean more mouse experiments (perhaps including induced SIDS in mice with specific genetic mutations to see if the model picks up genotype-specific patterns). More vitally, it would mean collecting data from infant monitors in hospitals. One idea is to retrospectively analyze monitor data from infants who had ALTEs (Apparent Life-Threatening Events) or who researchers monitored due to being siblings of SIDS victims to see if our model’s patterns hold up. These ideas are easier said than done due to privacy and data availability, but it’s the direction to move before clinicians can deploy such a model.
Towards a Smart Baby Monitor: The moonshot vision motivated us throughout as a consumer-friendly device that could be in a crib at home, silently watching over a sleeping infant. Imagine a baby monitor or a wearable sock that not only tracks heart rate and breathing (some products do that already) but runs an AI algorithm trained to recognize the red-flag pattern of impending SIDS. If detected, it could alert parents or stimulate the baby (some have proposed vibrating devices that nudge a baby if they stop breathing). Our report explicitly noted that integrating these findings into consumer health tech is a goal. As someone moving toward industry, I see a startup opportunity here, but there is also a need for rigorous clinical testing and regulatory approval, given the stakes. Any such device would likely need FDA clearance and careful risk-benefit analysis. It’s a challenging road, but the lives saved each year could be well worth it.
The SIDS AI project was a capstone in every sense for me: technically, educationally, and personally. We started with a heartbreaking problem and a heap of messy data and extracted a glimmer of insight and hope. Along the way, I learned a few key lessons:
Marry Domain Knowledge with AI Creativity: Neither alone would have succeeded. Our project sponsors’ deep understanding of neonatal physiology guided our feature engineering (e.g., focusing on heart rate variability and respiratory patterns) to improve our models’ chances. Conversely, letting cutting-edge AI loose on spectrograms revealed patterns no one might have coded as a feature (smooth vs choppy signal textures). This marriage of approaches yielded results where each alone had failed. I’ve come to appreciate that in fields like healthcare, you can’t treat AI as a magic black box; you need to guide it with what you know about the science.
Prioritize the Clinical Objective, Not Just the Metric: Our initial accuracy highs were misleading. The real goal was a high recall of events, even at the expense of false positives. This is a common theme in diagnostic medicine: a screening test should catch all potential issues, and a follow-up test can weed out false alarms. So, we basically built a screening tool. Keeping that end-use in mind (and the value of a life saved versus an inconvenience) helped us set the correct targets and correctly choose the CNN model that aligned with those targets.
Small Data Is a Big Problem (But You Can Get Creative): Working with limited data forced us to be clever with techniques and careful not to overfit. We augmented data, used transfer learning ideas from image recognition, and did extensive cross-validation. Still, there’s no substitute for more data. This challenge is a lesson for anyone working on a similar project: if you can collect more, do it. If you can’t, acknowledge the limitations and design your model evaluation accordingly (we reported confidence intervals and avoided over-claiming performance). Encouragingly, even small datasets can yield meaningful models in this age, especially if combined with knowledge-based feature design, but robustness will always be a concern until validated on larger scales.
Interdisciplinary Communication is Key: I had to learn to speak the language of doctors and biologists, and they had to learn a bit of mine. This mutual education meant that, for example, when our model found something, the physicians on the team could interpret it in terms of autonomic nervous function or hypoxia tolerance. That’s when AI becomes more than numbers; it becomes insight. I suspect the future of AI in healthcare will see many of these partnerships, and success will depend on our ability as technologists to collaborate and communicate beyond our field.
As I finish writing this, I’m struck by how far we’ve come and how much there is to go. Automating healthcare diagnostics with AI is not an overnight revolution but a gradual integration, solving one problem at a time, one project at a time. Our SIDS project was one such step, turning raw signals into a hint of foresight. Standing at the interface of data science and medicine, I’m optimistic. The future of predictive healthcare is emerging now: from NICU algorithms that anticipate infant sepsis, to wearable ECG devices that warn of heart attacks before the patient feels anything, to perhaps one day a crib monitor that whispers an alert and prevents a crib death. These advances are born out of multidisciplinary teams and persistent iteration, just like ours.
In the end, our AI didn’t solve SIDS (that day is still ahead, we hope), but it taught us lessons about what it will take. It reinforced my passion for this field: the chance to save even one life with code and ingenuity is a powerful motivator. On a personal level, leading this capstone showed me the kind of impact a determined engineer can have in healthcare with the right team, the right tools, and a whole lot of heart.
Beckwith, J.B. (1973). The sudden infant death syndrome. Current Problems in Pediatrics, 3(8), 3–36.
Huang, Y. et al. (2023). Statistical report on SIDS incidence (~1,300 annual cases in the US).
Hewitt, A.L. et al. (2020). Study on heart rate variability differences in infants who succumbed to SIDS vs. controls.
Nezamabadi, M. et al. (2022). Research on ECG R-wave feature engineering for infant risk assessment.
Popescu, A. et al. (2009). Use of QRS complex morphological integration (area under the curve) for classifying abnormal heartbeats.
Keles, E., & Bagci, U. (2023). The past, current, and future of NICUs with AI: a systematic review. npj Digital Medicine, 6(220). (Summary of 106 studies on AI in neonatology; highlights use of vital sign analysis and need for data-driven early warning systems.)
Oltman, S. et al. (2024). Metabolic biomarkers and SIDS risk. JAMA Pediatrics. (Identified metabolic signals in infants who died of SIDS, suggesting new avenues for risk prediction.)
(Code Repository) RiceD2KLab/BCM_SIDS_Sp24 – Utilizing Machine Learning to Identify Cardio-Respiratory Signatures Predictive of SIDS. (All project code and models are available on GitHub).