Brain tumor segmentation with Deep Neural Networks

Mohammad Havaei, Axel Davy, David Warde-Farley, Antoine Biard, Aaron Courville, Yoshua Bengio, Chris Pal, Pierre‐Marc Jodoin, Hugo Larochelle

Medical Image Analysis·2016·Neuroscience·3,193 citations

7 min read

Read the full paper at DOI or on arxiv

TL;DR

The paper presents a fully automatic multimodal CNN approach for glioma/brain tumor segmentation on BRATS 2013 using efficient, fully convolutional inference.

Briefing Cornell Notes

Briefing

This paper addresses a central problem in neuro-oncology: automatically segmenting glioma/brain tumor subregions in multimodal MRI. The authors’ research question is how to design deep convolutional neural networks (CNNs) that (i) achieve competitive segmentation accuracy on the BRATS 2013 benchmark and (ii) remain computationally efficient enough for practical use. This matters because accurate tumor delineation supports diagnosis, treatment planning, and longitudinal growth assessment, yet glioblastomas are notoriously difficult to segment due to fuzzy boundaries, heterogeneous appearance, and strong variability in intensity scale across scanners and acquisition protocols. The work is motivated by the limitations of earlier pipelines that relied on hand-designed features and separate classifiers, which can be slow, memory-heavy, and not well adapted to tumor-specific structure.

Methodologically, the authors build a fully automatic segmentation system using CNNs trained to classify each pixel (center pixel of an input patch) into one of five BRATS labels: non-tumor, necrosis, edema, non-enhancing tumor, and enhancing tumor. Because BRATS volumes are not isotropic in the third dimension, they perform slice-by-slice 2D axial segmentation. Each CNN input is an M×M 2D patch containing four MRI modalities as channels (T1, T1C, T2, FLAIR). The network architecture is fully convolutional at inference time: rather than using a fully connected layer that would require per-patch evaluation, the authors implement the final classification stage as a convolutional layer with one kernel per label, enabling dense prediction over an entire image in a single forward pass.

The core architectural contributions are (1) a two-pathway CNN that learns both local and global context simultaneously, and (2) cascaded CNNs that model local label dependencies by feeding the output probabilities of one CNN into a second CNN. The two-pathway model (TwoPathCNN) uses two streams with different receptive field sizes: a local pathway with smaller 7×7 receptive fields and a global pathway with larger 13×13 receptive fields, whose top feature maps are concatenated before the output layer. The cascaded models are trained by first training TwoPathCNN, freezing its parameters, and then training a second CNN that receives the first CNN’s output either as additional input channels (InputCascadeCNN), concatenated into the local pathway’s hidden representation (LocalCascadeCNN), or concatenated right before the output layer (MFCascadeCNN). This design is positioned as an efficient alternative to structured prediction methods like conditional random fields (CRFs), which can be computationally expensive.

Training addresses a major practical issue: severe class imbalance. The authors state that healthy voxels (label 0) comprise 98% of all voxels, while the remaining 2% is split among necrosis (0.18%), edema (1.1%), non-enhancing tumor (0.12%), and enhancing tumor (0.38%). To prevent the model from being overwhelmed by healthy patches, they use a two-phase training procedure. In phase 1, they construct a patch dataset with uniform label sampling so that all five classes are equiprobable. In phase 2, they retrain only the output layer (keeping earlier convolutional kernels fixed) using the natural imbalanced label distribution to calibrate class probabilities.

Optimization uses stochastic gradient descent with momentum. The initial momentum coefficient is 1 = 0.5 and increases to 0.9; the initial learning rate is = 0.005 and decays by a factor of 10−1 each epoch. Regularization includes L1 and L2 penalties on weights, early stopping based on a validation set, and dropout. The authors also apply minimal preprocessing: intensity clipping (removing the top and bottom 1%), N4 bias correction for T1 and T1C, and per-channel z-score normalization. Post-processing removes spurious flat blobs via connected components.

The experiments use the BRATS 2013 dataset. The training set contains 30 patient subjects (20 high grade, 10 low grade) with pixel-accurate ground truth. The test set contains 10 high-grade tumors, and the leaderboard set contains 25 subjects (21 high grade, 4 low grade) without ground truth. The model is trained on slice patches; the authors report that training iterates over approximately 2.2 million tumorous patches and 3.2 million healthy patches. They evaluate using the BRATS online system, which reports Dice (F1), sensitivity, and specificity for three clinically grouped tumor regions: complete tumor (all substructures), core tumor (excluding edema), and enhancing tumor.

Key results show that the proposed architectures improve accuracy while dramatically reducing inference time. For the uncascaded TwoPathCNN family, the best uncascaded model is TwoPathCNN (two-phase training), ranked 4 on the BRATS 2013 online scoreboard with Dice scores of 0.85 (Complete), 0.78 (Core), and 0.73 (Enhancing). In contrast, a single-path local model with two-phase training (LocalPathCNN) is ranked 9 with Dice 0.85 (Complete), 0.74 (Core), 0.71 (Enhancing), and the global-only model (GlobalPathCNN*) is ranked 14 with Dice 0.82 (Complete), 0.73 (Core), 0.68 (Enhancing). The authors also show that two-phase training is critical: for the single-path local model, the rank improves from 15 (one-phase) to 9 (two-phase).

The cascaded architectures further improve boundary quality and overall performance. The best overall model is InputCascadeCNN (ranked 2 on the BRATS 2013 online scoreboard), achieving Dice 0.88 (Complete), 0.79 (Core), and 0.73 (Enhancing). Compared to the best uncascaded model (TwoPathCNN), this corresponds to an absolute Dice improvement of +0.03 on Complete and +0.01 on Core, while maintaining the same Enhancing Dice (0.73). The other cascades are MFCascadeCNN (rank 4) with Dice 0.86 (Complete), 0.77 (Core), 0.73 (Enhancing) and LocalCascadeCNN (rank 4-a) with Dice 0.88 (Complete), 0.76 (Core), 0.72 (Enhancing). The authors report that MFCascadeCNN yields smoother class boundaries, while InputCascadeCNN provides the best overall gains.

Speed is a major practical contribution. The authors report that TwoPathCNN produces a full-brain segmentation in 25 seconds on an NVIDIA Titan Black GPU, and the other cascaded variants take on average 1.5 minutes (MFCascadeCNN), 1.7 minutes (LocalCascadeCNN), and 3 minutes (InputCascadeCNN*). They emphasize that this is over 30× faster than the BRATS 2013 winner Tustison et al., which takes about 100 minutes per brain, and that TwoPathCNN is over 200× faster than Tustison et al. in their comparison.

Limitations are not quantified with formal statistical uncertainty (e.g., no confidence intervals or p-values are reported), and the paper relies on the BRATS 2013 benchmark evaluation protocol. The authors also acknowledge practical constraints: they did not use BRATS 2014 due to issues with evaluation system performance and labeled data quality, and they attempted 3D approaches but found them slower without performance gains. Additionally, the segmentation is performed slice-by-slice in 2D, which may miss inter-slice context. Finally, the cascaded models require larger input patches for the second stage to match spatial dimensions, which can increase compute relative to the base model.

Practically, the results suggest that high-capacity CNNs can achieve state-of-the-art (or better) glioma segmentation on BRATS 2013 while being fast enough for near-real-time deployment. Who should care includes radiologists and neuro-oncology teams needing consistent tumor subregion delineations, developers of clinical decision support systems, and researchers interested in efficient structured prediction alternatives to CRFs. The paper’s design choices—two-phase imbalance-aware training, two-pathway local/global context, and cascaded probability refinement—provide a blueprint for building accurate and deployable segmentation networks in other medical imaging tasks with class imbalance and fuzzy boundaries.

Cornell Notes

The paper proposes an efficient CNN-based framework for multimodal brain tumor segmentation on BRATS 2013. It introduces a two-pathway architecture to jointly learn local detail and global context, and cascaded CNNs that refine predictions by injecting the first network’s probability maps into a second network, achieving strong Dice scores with much faster inference than prior state-of-the-art.

What is the main research problem and why is it difficult?

Automatically segment glioma subregions in multimodal MRI; difficulty comes from fuzzy tumor boundaries, heterogeneous appearance, tumors occurring anywhere in the brain, and non-standardized MR intensity scales across scanners/protocols.

What study design and dataset are used to evaluate the method?

A benchmark-style evaluation on the MICCAI BRATS 2013 dataset using the official training/test/leaderboard splits and the BRATS online evaluation system.

How is the segmentation formulated for training and inference?

The model predicts the class of the center pixel of an M×M 2D axial patch (with four MRI modalities as channels). At inference, the final layer is implemented convolutionally to produce dense per-pixel probabilities over the whole slice/brain efficiently.

What are the key architectural innovations?

TwoPathCNN uses two receptive-field pathways (local 7×7 and global 13×13) whose features are concatenated. Cascaded architectures (InputCascadeCNN, LocalCascadeCNN, MFCascadeCNN) refine segmentation by feeding the first CNN’s output probabilities into a second CNN.

How does the method handle severe class imbalance?

It uses two-phase training: phase 1 samples patches so all five labels are equiprobable; phase 2 retrains only the output layer using the natural imbalanced label distribution.

What preprocessing and regularization are applied?

Intensity clipping (remove top/bottom 1%), N4 bias correction for T1/T1C, per-channel normalization, connected-component post-processing to remove flat blobs, plus L1/L2 weight regularization, dropout, and early stopping.

What is the best uncascaded model performance?

TwoPathCNN* (two-phase training) achieves Dice 0.85 (Complete), 0.78 (Core), 0.73 (Enhancing) and is ranked 4 on the BRATS 2013 online scoreboard.

What is the best overall cascaded model performance?

InputCascadeCNN* is ranked 2 and achieves Dice 0.88 (Complete), 0.79 (Core), 0.73 (Enhancing).

How fast is the method compared with prior work?

TwoPathCNN takes about 25 seconds per brain on an NVIDIA Titan Black GPU; InputCascadeCNN* averages about 3 minutes. The authors report this is over 30× faster than the BRATS 2013 winner Tustison et al. (about 100 minutes per brain).

Review Questions

Why does implementing the final classifier as a convolutional layer enable major speedups compared with patch-wise fully connected inference?
Explain how two-phase training changes the learning dynamics under label imbalance, and why retraining only the output layer in phase 2 is sufficient.
Compare the roles of the local and global pathways in TwoPathCNN and predict how each might affect Dice on Core vs Enhancing regions.
How do cascaded CNNs approximate the effect of structured prediction (e.g., CRF mean-field iterations) while remaining computationally efficient?

Key Points

1
The paper presents a fully automatic multimodal CNN approach for glioma/brain tumor segmentation on BRATS 2013 using efficient, fully convolutional inference.
2
TwoPathCNN jointly learns local detail and global context via two receptive-field pathways (7×7 local and 13×13 global) concatenated before prediction.
3
A two-phase training strategy is critical for class imbalance: train with equiprobable labels, then retrain only the output layer using the natural BRATS label distribution.
4
Cascaded architectures refine predictions by injecting the first CNN’s output probability maps into a second CNN (best: InputCascadeCNN*).
5
On BRATS 2013, InputCascadeCNN* achieves Dice 0.88 (Complete), 0.79 (Core), 0.73 (Enhancing) and is ranked 2 on the online scoreboard.
6
Inference is fast: TwoPathCNN takes ~25 seconds per brain and InputCascadeCNN* ~3 minutes on an NVIDIA Titan Black GPU, reported as >30× faster than the BRATS 2013 winner.

Highlights

“InputCascadeCNN* … Dice 0.88 0.79 0.73 for Complete/Core/Enhancing and is ranked 2 on the BRATS 2013 online scoreboard.”

“TwoPathCNN* … Dice 0.85 0.78 0.73 … ranked 4.”

“With this implementation, we are able to produce a segmentation in 25 seconds per brain on the Titan black card with the TwoPathCNN model.”

“This turns out to be 45 times faster than when we extracted a patch at each pixel and processed them individually for the entire brain.”

“The time needed to segment an entire brain with any of these CNN architectures varies between 25 seconds and 3 minutes.”

Topics

Medical image analysis
Neuroimaging segmentation
Deep learning for segmentation
Convolutional neural networks (CNNs)
Class imbalance handling
Structured prediction approximations
Multimodal MRI analysis
Efficient inference for clinical deployment

Mentioned

Pylearn2
NVIDIA Titan Black (GPU)
N4ITK (bias correction)
BRATS online evaluation system
N4 bias correction
Mohammad Havaei
Axel Davy
David Warde-Farley
Antoine Biard
Aaron Courville
Yoshua Bengio
Chris Pal
Pierre-Marc Jodoin
Hugo Larochelle
Yann LeCun
Geoffrey Hinton
Alex Krizhevsky
Nicolas Tustison
Benoit Menze
M. Reyes
I. Goodfellow
Nitish Srivastava
BRATS - Multimodal Brain Tumor Segmentation (benchmark/challenge)
CNN - Convolutional Neural Network
DNN - Deep Neural Network
MRI - Magnetic Resonance Imaging
T1 - T1-weighted MRI modality
T1C - T1-contrast (post-contrast) MRI modality
T2 - T2-weighted MRI modality
FLAIR - Fluid Attenuated Inversion Recovery MRI modality
CRF - Conditional Random Field
Dice - Dice similarity coefficient (F-measure)
SGD - Stochastic Gradient Descent
GPU - Graphics Processing Unit
MFCascadeCNN - Cascaded CNN variant with pre-output concatenation