Get AI summaries of any video or article — Sign up free
Towards Epileptic Seizure Prediction with Deep Network | Kata Slama | OpenAI Scholars Demo Day 2020 thumbnail

Towards Epileptic Seizure Prediction with Deep Network | Kata Slama | OpenAI Scholars Demo Day 2020

OpenAI·
5 min read

Based on OpenAI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Seizure prediction is framed as a classification problem using pre-seizure neural signals converted into spectrograms.

Briefing

Epileptic seizure prediction is moving from a theoretical possibility toward a measurable machine-learning task: a deep network trained on intracranial voltage recordings can distinguish “safe” brain activity from patterns occurring up to an hour before an upcoming seizure. The practical significance is straightforward—better prediction could reduce injuries, anxiety, and quality-of-life barriers for people with epilepsy, and it may also support safer driving and closed-loop brain stimulation approaches.

Seizures happen when neurons fire in synchrony, sometimes spreading across the brain and causing loss of consciousness. Epilepsy affects about 1% of people worldwide, with disproportionate impact in developing countries. Because seizures are unpredictable, sufferers face physical risks from accidents and often lose the ability to drive. For many regions, that unpredictability also fuels anxiety and limits treatment options. Earlier work left open whether any reliable pre-seizure signal exists; more recent evidence suggests there is a detectable brain signature before seizures, even if clinicians can’t yet agree on what that signature looks like.

The project uses publicly available data from Cargill’s canine recordings: voltage sensor time series from multiple brain locations in five dogs. Each recording is segmented into epochs spanning from well before seizures (at least four hours) and from up to one hour before seizure onset. The pipeline converts these time windows into spectrograms—image-like representations that expose frequency content over time—then frames the task as classification between “danger” epochs (leading to a seizure) and “safe” epochs.

A key challenge is imbalance: one dog contributes most of the data, and there are far fewer danger epochs than safe epochs. Without correction, models can “cheat” by overfitting to the dominant dog or by defaulting to the majority class. The work addresses this by subsampling to balance both dogs and class labels. With this balanced setup, chance-level performance sits around 55%. A ResNet-18 model reaches about 69% accuracy, while a logistic regression baseline with standard preprocessing lands near chance.

Model quality is then evaluated through trade-off metrics rather than accuracy alone. An ROC curve area around 0.77 indicates moderate separability: aiming to capture 99% of seizures would still label only about 80% of safe epochs as safe, implying a substantial false-alarm burden. Precision-recall analysis paints a similar picture: targeting 99% seizure recall yields roughly 55% precision, meaning many “danger” predictions would not correspond to true seizures. A confusion matrix shows about 72% of true seizures correctly flagged, with remaining seizures missed.

The results are framed as promising but incomplete. Future improvements include hyperparameter tuning, stronger regularization, trying alternative architectures such as larger ResNets or sequence models, and revisiting preprocessing. There’s also a plan to interpret what the network learns using OpenAI’s interpretability tooling (including weight visualization and activation maximization). Practical questions raised in Q&A focus on data availability for patients: the training data excludes seizure-time recordings and relies on pre-seizure windows, and real-world deployment may require adapting models from intracranial signals to less invasive scalp EEG. The work also notes that seizure predictability may vary by seizure type and by how far in advance warning signals actually exist, motivating multi-label and hierarchical approaches (e.g., separate models for 10-minute versus 1-hour horizons) and longer temporal segments. Finally, it highlights an architectural opportunity: treating electrode channels as correlated rather than independent observations.

Cornell Notes

A deep learning model can classify intracranial voltage patterns as “safe” or “danger” based on spectrograms derived from time windows up to one hour before seizure onset. Using publicly available canine data, the project balances both class labels and dog identity to prevent the network from overfitting to the dominant subject or the majority class. A ResNet-18 trained from scratch reaches about 69% accuracy on the balanced dataset, with ROC-AUC around 0.77. Trade-off metrics show the cost of high recall: capturing 99% of seizures would still produce many false alarms (precision around 55% in the precision-recall view). Future work targets better accuracy via tuning, alternative architectures, interpretability, and more realistic prediction horizons and electrode correlation modeling.

Why is seizure prediction considered possible at all, given that clinicians can’t agree on a specific pre-seizure “signature”?

The work rests on the idea that seizures are preceded by detectable brain activity changes, even if the exact pattern isn’t well defined. Earlier debates questioned whether any signal existed before seizures; later evidence (including anecdotal reports) supports the presence of a pre-seizure signal. That makes the problem suitable for neural networks that can learn complex patterns from neural time series rather than relying on a hand-crafted biomarker.

How does the project turn raw neural time series into something a vision model can learn from?

It segments voltage sensor recordings into epochs: “safe” windows far from seizures (at least four hours before) and “danger” windows up to one hour before seizure onset. Each epoch is converted into a spectrogram, which represents how frequency content changes over time. Because spectrograms behave like images, the model uses ResNet-18, a standard image classification architecture, to learn discriminative patterns in those time-frequency representations.

What data imbalance problems can break seizure prediction models, and how were they handled here?

Two imbalances matter: (1) dog identity—one dog contributes most of the data, which can cause overfitting to that single subject; and (2) class imbalance—danger epochs are much rarer than safe epochs, which can lead to majority-class guessing. The project subsamples to equalize representation across dogs and across the two classes, raising chance-level performance to about 55% and making improvements more meaningful.

What do ROC-AUC, precision-recall, and the confusion matrix reveal that accuracy alone can hide?

Accuracy can look decent while the model still fails in the ways that matter clinically. ROC-AUC around 0.77 summarizes separability across thresholds. Precision-recall shows the recall/false-alarm trade-off directly: aiming to catch 99% of seizures yields only about 55% precision, so many “danger” labels are false alarms. The confusion matrix similarly indicates incomplete capture of true seizures (about 72% correctly labeled) and misclassifications off the diagonal.

Why does the dataset design matter for real-world deployment?

The training data includes safe periods and windows up to one hour before seizure onset, but it does not include seizure-time recordings. That means the model learns to forecast from pre-onset signals only. In Q&A, this connects to deployment challenges: patients may have intracranial implants (more signal) or scalp EEG (less signal), so models may need adaptation and validation across recording modalities.

What future modeling changes target the biggest remaining uncertainties?

Several uncertainties drive next steps: warning signals may not exist uniformly across the full one-hour horizon, and predictability may differ by seizure type. The plan includes multi-label classification for different lead times (e.g., 10 minutes vs 1 hour), hierarchical prediction over 10-minute chunks, increasing segment lengths, and modeling electrode correlations rather than treating each electrode’s spectrogram as independent.

Review Questions

  1. What specific forms of imbalance (subject identity and class frequency) can cause misleading performance in seizure prediction, and how does subsampling address them?
  2. How do precision-recall results change the interpretation of a model that might look acceptable under accuracy or ROC-AUC alone?
  3. Why might a one-hour prediction horizon be unreliable, and what multi-horizon or hierarchical strategies are proposed to handle that?

Key Points

  1. 1

    Seizure prediction is framed as a classification problem using pre-seizure neural signals converted into spectrograms.

  2. 2

    The dataset uses canine intracranial voltage recordings from multiple electrodes, with labels based on windows up to one hour before seizure onset.

  3. 3

    Balancing across both dog identity and class labels is crucial to prevent overfitting and majority-class guessing.

  4. 4

    A ResNet-18 trained from scratch achieves about 69% accuracy on the balanced dataset, with ROC-AUC around 0.77.

  5. 5

    High-recall targets come with a false-alarm cost: catching 99% of seizures corresponds to roughly 55% precision in the reported precision-recall trade-off.

  6. 6

    Planned improvements include hyperparameter tuning, regularization, alternative architectures (e.g., larger ResNets or sequence models), and better preprocessing.

  7. 7

    Future work also targets realistic deployment by adapting to less invasive EEG and by modeling electrode correlations and multiple prediction horizons.

Highlights

A ResNet-18 model trained on spectrograms of pre-seizure intracranial signals can separate “danger” from “safe” activity with ROC-AUC around 0.77.
Balancing the dataset across dogs and classes prevents the model from exploiting subject identity or majority-class bias.
Trade-off metrics show the clinical tension: 99% seizure recall can still mean many false alarms (precision near 55%).
Next steps focus on multi-horizon prediction (e.g., 10-minute vs 1-hour lead times) and learning relationships across electrodes rather than treating them as independent.

Topics

  • Seizure Prediction
  • Spectrogram Classification
  • ResNet-18
  • Class Imbalance
  • Precision-Recall Tradeoff

Mentioned

  • Johannes
  • ROC
  • AUC
  • EEG