Below you can find the keynotes, oral presentations and poster presentations, with links to the videos, the papers, slides, code, and data.
The original program with timetable can be found here.
Efficient techniques for learning confidence. Graham Taylor, University of Guelph. video & abstract
Modern neural networks are very powerful predictive models, but they are often incapable of recognizing when their predictions may be wrong. I will discuss a method of learning confidence estimates for neural networks that is simple to implement, computationally efficient and produces intuitively interpretable outputs. I will demonstrate that on the task of out-of-distribution detection, our technique surpasses recently proposed techniques which construct confidence based on the network’s output distribution, without requiring any additional labels or access to out-of-distribution examples. I will also show that it can generate per-pixel confidence maps and image-level prediction of failure in medical image segmentation.
The impact of deep learning and artificial intelligence on radiology. Ronald Summers, NIH. video & abstract
Major advances in computer science and artificial intelligence, in particular “deep learning”, are beginning to have an impact on radiology. There has been an explosion of research interest and number of publications about the use of deep learning in radiology. In this presentation, I will show examples of how deep learning has led to major improvements in automated radiology image analysis, especially for image segmentation and computer aided diagnosis. I will also show how the radiology report can be used to do bulk annotation of images for training the deep learning systems.
Deep learning for genomics and graph-structured data. Adriana Romero, Facebook AI Research. video & abstract
In the recent years, deep learning has achieved promising results in medical imaging analysis. However, in order to fully exploit the richness of healthcare data, new models able to deal with a variety of modalities have to be designed. In this talk, I will discuss recent advances in deep learning for genomics and graph-structured data. I will present Diet Networks, a recent contribution which copes with the high dimensionality of genomic data. Then, I will introduce our work on Graph Attention Networks, which has recently shown to improve results on protein-protein interaction networks and mesh-based parcellation of the cerebral cortex.
Detecting lung nodules using deep learning. Tim Salimans, Open AI. abstract
Lung cancer is the leading cause of cancer-related death worldwide. By screening high risk individuals for lung nodules using low-dose CT scans, this type of cancer can be detected when it is still treatable. However, large-scale implementation of such screening programs requires radiologists to evaluate a huge number of scans, which is costly and error-prone. Aidence is an Amsterdam start-up developing an AI assistant for helping radiologists with detecting, reporting and tracking of lung nodules. This talk covers the deep learning techniques that we use to obtain state of the art accuracy in this domain, as well as the requirements and challenges faced when developing a deep learning system for use in clinical practice.
Breast cancer diagnosis often requires accurate detection of metastasis in lymph nodes through Whole-slide Images (WSIs). Recent advances in deep convolutional neural networks (CNNs) have shown significant successes in medical image analysis and particularly in computational histopathology. Because of the outrageous large size of WSIs, most of the methods divide one slide into lots of small image patches and perform classification on each patch independently. However, neighboring patches often share spatial correlations, and ignoring these spatial correlations may result in inconsistent predictions. In this paper, we propose a neural conditional random field (NCRF) deep learning framework to detect cancer metastasis in WSIs. NCRF considers the spatial correlations between neighboring patches through a fully connected CRF which is directly incorporated on top of a CNN feature extractor. The whole deep network can be trained end-to-end with standard back-propagation algorithm with minor computational overhead from the CRF component. The CNN feature extractor can also benefit from considering spatial correlations via the CRF component. Compared to the baseline method without considering spatial correlations, we show that the proposed NCRF framework obtains probability maps of patch predictions with better visual quality. We also demonstrate that our method outperforms the baseline in cancer metastasis detection on the Camelyon16 dataset and achieves an average FROC score of 0.8096 on the test set. NCRF is open sourced at https://github.com/baidu-research/NCRF.
We present a method for the automated segmentation of knee bones and cartilage from magnetic resonance imaging, that combines a priori knowledge of anatomical shape with Convolutional Neural Networks (CNNs). The proposed approach incorporates 3D Statistical Shape Models (SSMs) as well as 2D and 3D CNNs to achieve a robust and accurate segmentation of even highly pathological knee structures. The method is evaluated on data of the MICCAI grand challenge “Segmentation of Knee Images 2010”. For the first time an accuracy equivalent to the inter-observer variability of human readers has been achieved in this challenge. Moreover, the quality of the proposed method is thoroughly assessed using various measures for 507 manual segmentations of bone and cartilage, and 88 additional manual segmentations of cartilage. Our method yields sub-voxel accuracy. In conclusion, combining of anatomical knowledge using SSMs with localized classification via CNNs results in a state-of-the-art segmentation method.
Disease progression modeling (DPM) using longitudinal data is a challenging task in machine learning for healthcare that can provide clinicians with better tools for diagnosis and monitoring of disease. Existing DPM algorithms neglect temporal dependencies among measurements and make parametric assumptions about biomarker trajectories. In addition, they do not model multiple biomarkers jointly and need to align subjects’ trajectories. In this paper, recurrent neural networks (RNNs) are utilized to address these issues. However, in many cases, longitudinal cohorts contain incomplete data, which hinders the application of standard RNNs and requires a pre-processing step such as imputation of the missing values. We, therefore, propose a generalized training rule for the most widely used RNN architecture, long short-term memory (LSTM) networks, that can handle missing values in both target and predictor variables. This algorithm is applied for modeling the progression of Alzheimer’s disease (AD) using magnetic resonance imaging (MRI) biomarkers. The results show that the proposed LSTM algorithm achieves a lower mean absolute error for prediction of measurements across all considered MRI biomarkers compared to using standard LSTM networks with data imputation or using a regression-based DPM method. Moreover, applying linear discriminant analysis to the biomarkers’ values predicted by the proposed algorithm results in a larger area under the receiver operating characteristic curve (AUC) for clinical diagnosis of AD compared to the same alternatives, and the AUC is comparable to state-of-the-art AUC’s from a recent cross-sectional medical image classification challenge. This paper shows that built-in handling of missing values in LSTM network training paves the way for application of RNNs in disease progression modeling.
Deep networks have set the state-of-the-art in most image analysis tasks by replacing handcrafted features with learned convolution filters within end-to-end trainable architectures. Still, the specifications of a convolutional network are subject to much manual design – the shape and size of the receptive field for convolutional operations is a very sensitive part that has to be tuned for different image analysis applications. 3D fully-convolutional multi-scale architectures with skip-connection that excel at semantic segmentation and landmark localisation have huge memory requirements and rely on large annotated datasets – an important limitation for wider adaptation in medical image analysis. We propose a novel and effective method based on a single trainable 3D convolution kernel that addresses these issues and enables high quality results with a compact four-layer architecture and without sensitive hyperparameters for convolutions and architectural design. Instead of a manual choice of filter size, dilation of weights, and number of scales, our one binary extremely large and inflecting sparse kernel (OBELISK) automatically learns filter offsets in a differentiable continuous space together with weight coefficients. Geometric data augmentation can be directly incorporated into the training by simple coordinate transforms. This powerful new architecture has less than 130’000 parameters, can be trained in few minutes with only 700 MBytes of memory and achieves an increase of Dice overlap of +5.5\% compared to the U-Net for CT multi-organ segmentation.
Convolutional Neural Networks (CNNs) require a large amount of annotated data to learn from, which is often difficult to obtain in the medical domain. In this paper we show that the sample complexity of CNNs can be significantly improved by using 3D roto-translation group convolutions (G-Convs) instead of the more conventional translational convolutions. These 3D G-CNNs were applied to the problem of false positive reduction for pulmonary nodule detection, and proved to be substantially more effective in terms of performance, sensitivity to malignant nodules, and speed of convergence compared to a strong and comparable baseline architecture with regular convolutions, data augmentation and a similar number of parameters. For every dataset size tested, the G-CNN achieved a FROC score close to the CNN trained on ten times more data.
Convolutional neural networks (CNNs) have shown remarkable results over the last several years for a wide range of computer vision tasks. A new architecture recently introduced by Sabour et al., referred to as a capsule networks with dynamic routing, has shown great initial results for digit recognition and small image classification. The success of capsule networks lies in their ability to preserve more information about the input by replacing max-pooling layers with convolutional strides and dynamic routing, allowing for preservation of part-whole relationships in the data. This preservation of the input is demonstrated by reconstructing the input from the output capsule vectors. Our work expands the use of capsule networks to the task of object segmentation for the first time in the literature. We extend the idea of convolutional capsules with locally-connected routing and propose the concept of deconvolutional capsules. Further, we extend the masked reconstruction to reconstruct the positive input class. The proposed convolutional-deconvolutional capsule network, called SegCaps, shows strong results for the task of object segmentation with substantial decrease in parameter space. As an example application, we applied the proposed SegCaps to segment pathological lungs from low dose CT scans and compared its accuracy and efficiency with other U-Net-based architectures. SegCaps is able to handle large image sizes (512 x 512) as opposed to baseline capsules (typically less than 32 x 32). The proposed SegCaps reduced the number of parameters of U-Net architecture by 95.4% while still providing a better segmentation accuracy.
In order to understand the organization of the cerebral cortex, it is necessary to create a map or parcellation of cortical areas. Reconstructions of the cortical surface created from structural MRI scans, are frequently used in neuroimaging as a common coordinate space for representing multimodal neuroimaging data. These meshes are used to investigate healthy brain organization as well as abnormalities in neurological and psychiatric conditions. We frame cerebral cortex parcellation as a mesh segmentation task, and address it by taking advantage of recent advances in generalizing convolutions to the graph domain. In particular, we propose to assess graph convolutional networks and graph attention networks, which, in contrast to previous mesh parcellation models, exploit the underlying structure of the data to make predictions. We show experimentally on the Human Connectome Project dataset that the proposed graph convolutional models outperform current state-of-the-art and baselines, highlighting the potential and applicability of these methods to tackle neuroimaging challenges, paving the road towards a better characterization of brain diseases.
The analysis of glandular morphology within colon histopathology images is a crucial step in determining the stage of colon cancer. Despite the importance of this task, manual segmentation is laborious, time-consuming and can suffer from subjectivity among pathologists. The rise of computational pathology has led to the development of automated methods for gland segmentation that aim to overcome the challenges of manual segmentation. However, this task is non-trivial due to the large variability in glandular appearance and the difficulty in differentiating between certain glandular and non-glandular histological structures. Furthermore, within pathological practice, a measure of uncertainty is essential for diagnostic decision making. For example, ambiguous areas may require further examination from numerous pathologists. To address these challenges, we propose a fully convolutional neural network that counters the loss of information caused by max-pooling by re-introducing the original image at multiple points within the network. We also use atrous spatial pyramid pooling with varying dilation rates for resolution maintenance and multi-level aggregation. To incorporate uncertainty, we introduce random transformations during test time for an enhanced segmentation result that simultaneously generates an uncertainty map, highlighting areas of ambiguity. We show that this map can be used to define a metric for disregarding predictions with high uncertainty. The proposed network achieves state-of-the-art performance on the GlaS challenge dataset, as part of MICCAI 2015, and on a second independent colorectal adenocarcinoma dataset.
We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed Attention U-Net architecture is evaluated on two large CT abdominal datasets for multi-class image segmentation. Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency. The code for the proposed architecture is publicly available.
Precise segmentation of the vertebrae is often required for automatic detection of vertebral abnormalities. This especially enables incidental detection of abnormalities such as compression fractures in images that were acquired for other diagnostic purposes. While many CT and MR scans of the chest and abdomen cover a section of the spine, they often do not cover the entire spine. Additionally, the first and last visible vertebrae are likely only partially included in such scans. In this paper, we therefore approach vertebra segmentation as an instance segmentation problem. A fully convolutional neural network is combined with an instance memory that retains information about already segmented vertebrae. This network iteratively analyzes image patches, using the instance memory to search for and segment the first not yet segmented vertebra. At the same time, each vertebra is classified as completely or partially visible, so that partially visible vertebrae can be excluded from further analyses. We evaluated this method on spine CT scans from a vertebra segmentation challenge and on low-dose chest CT scans. The method achieved an average Dice score of 95.8% and 92.1%, respectively, and a mean absolute surface distance of 0.194 mm and 0.344 mm.
In recent years, endomicroscopy imaging has become increasingly used for diagnostic purposes. It can provide intraoperative aids for real-time tissue characterization and can help to perform visual investigations aimed to discover epithelial cancers. However, accurate diagnosis and correct treatments are partially hampered by the low numbers of informative pixels generated by these devices. In the last decades, progress has been made to improve the hardware acquisition and the related image reconstruction in this domain. Nonetheless, due to the imaging environment, and the associated physical constraints, images with the desired resolution are still difficult to produce. Post-processing techniques, such as Super Resolution (SR), are an alternative solution to increase the quality of these images. SR techniques are often supervised, requiring aligned pairs of low-resolution (LR) and high-resolution (HR) patches to train a model. However, in some domains, the lack of HR images hinders the generation of these pairs and makes supervised training unsuitable. For this reason, we propose an unsupervised SR framework based on an adversarial deep neural network with a physically-inspired cycle consistency, designed to impose some acquisition properties on the super-resolved images. Our framework can exploit HR images, regardless of the domain where they are coming from, to transfer the super-resolution to the initial LR images. This property can be particularly useful in all situations where pairs of LR/HR are not available during the training. Our quantitative analysis, validated using a database of 238 endomicroscopy video sequences, shows the ability of the pipeline to produce convincing super-resolved images. A Mean Opinion Score (MOS) study also confirms this quantitative image quality assessment.
Navigated 2D multi-slice dynamic Magnetic Resonance (MR) imaging enables high contrast 4D MR imaging during free breathing and provides in-vivo observations for treatment planning and guidance. Navigator slices are vital for retrospective stacking of 2D data slices in this method. However, they also prolong the acquisition sessions. Temporal interpolation of navigator slices can be used to reduce the number of navigator acquisitions without degrading specificity in stacking. In this work, we propose a convolutional neural network (CNN) based method for temporal interpolation via motion field prediction. The proposed formulation incorporates the prior knowledge that a motion field underlies changes in the image intensities over time. Previous approaches that interpolate directly in the intensity space are prone to produce blurry images or even remove structures in the images. Our method avoids such problems and faithfully preserves the information in the image. Further, an important advantage of our formulation is that it provides an unsupervised estimation of bi-directional motion fields. We show that these motion fields can be used to halve the number of registrations required during 4D reconstruction, thus substantially reducing the reconstruction time.
The imaging workup in acute stroke can be simplified by reconstructing the non-contrast CT (NCCT) from CT perfusion (CTP) images, resulting in reduced workup time and radiation dose. This work presents a stacked bidirectional convolutional LSTM (C-LSTM) network to predict 3D volumes from 4D spatiotemporal data. Several parameterizations of the C-LSTM network were trained on a set of 17 CTP-NCCT pairs to learn to reconstruct NCCT from CTP and were subsequently quantitatively evaluated on a separate cohort of 16 cases. The results show that C-LSTM network clearly outperforms basic reconstruction methods and provides a promising general deep learning approach for handling high-dimensional spatiotemporal medical data.
Accelerated MRI reconstruction is important for making MRI faster and thus applicable in a broader range of problem domains. Computational tools allow for high-resolution imaging without the need to perform time-consuming measurements. Most recently, deep learning approaches have been applied to this problem. However, none of these methods have been shown to transfer well across different measurement settings. We propose to use Recurrent Inference Machines as a framework for accelerated MRI, which allows us to leverage the power of deep learning without explicit domain knowledge. We show in experiments that the model can generalize well across different setups, while at the same time it outperforms another deep learning method and a compressed sensing approach.
Automatic detection of anatomical landmarks is an important step for a wide range of applications in medical image analysis. Manual annotation of such landmarks is a tedious task and prone to observer errors. In this paper, we evaluate novel deep Reinforcement Learning (RL) strategies to train agents that can precisely localize target landmarks in medical scans. An artificial RL agent learns to identify the optimal path to the point of interest by interacting with an environment, in our case 3D images. Furthermore, we investigate the use of fixed- and multi-scale search strategies with hierarchical action steps in a coarse-to-fine manner. Multiple Deep Q-Network (DQN) based architectures are experimented in training of the proposed RL agents achieving good results for detecting multiple landmarks using a challenging fetal head ultrasound dataset.
Most recent research of neural networks in the field of computer vision has focused on improving accuracy of point predictions by developing various network architectures or learning algorithms. Uncertainty quantification accompanied by point estimation can lead to a more informed decision, and the quality of prediction can be improved. In medical imaging applications, assessment of uncertainty could potentially reduce untoward outcomes due to suboptimal decisions. In this paper, we invoke a Bayesian neural network and propose a natural way to quantify uncertainty in classification problems by decomposing predictive uncertainty into two parts, aleatoric and epistemic uncertainty. The proposed method takes into account discrete nature of the outcome, yielding correct interpretation of each uncertainty. We demonstrate that the proposed uncertainty quantification method provides additional insight to the point prediction using images from the Ischemic Stroke Lesion Segmentation Challenge.
Deep neural networks (DNNs) have revolutionized medical image analysis and disease diagnosis. Despite their impressive increase in performance, it is difficult to generate well-calibrated probabilistic outputs for such networks such that state-of-the-art networks fail to provide reliable uncertainty estimates regarding their decisions. We propose a simple but effective method using traditional data augmentation methods such as geometric and color transformations at test time. This allows to examine how much the network output varies in the vicinity of examples in the input spaces. Despite its simplicity, our method yields useful estimates for the input-dependent predictive uncertainties of deep neural networks. We showcase the impact of our method via the well-known collection of fundus images obtained from a previous Kaggle competition.
In this paper we demonstrate that through the use of adversarial training and additional unsupervised costs it is possible to train a multi-class anatomical segmentation algorithm without any ground-truth labels for the data set to be segmented. Specifically, using labels from a different data set of the same anatomy (although potentially in a different modality) we train a model to synthesise realistic multi-channel label masks from input cardiac images in both CT and MRI, through adversarial learning. However, as is to be expected, generating realistic mask images is not, on its own, sufficient for the segmentation task: the model can use the input image as a source of noise and synthesise highly realistic segmentation masks that do no necessarily correspond spatially to the input. To overcome this, we introduce additional unsupervised costs, and demonstrate that these provide sufficient further guidance to produce good segmentation results. We test our proposed method on both CT and MR data from the multi-modal whole heart segmentation challenge (MM-WHS) , and show the effect of our unsupervised costs on improving the segmentation results, in comparison to a variant without them.
Weak supervision, e.g., in the form of partial labels or image tags, is currently attracting significant attention in CNN segmentation as it can mitigate the lack of full and laborious pixel/voxel annotations, a common problem in medical imaging. Embedding high-order (global) inequality constraints on the network output, for instance, on the size of the target region, can leverage unlabeled data, guiding training with domain-specific knowledge. Inequality constraints are very flexible because they do not assume exact prior knowledge. However, constrained Lagrangian optimization has been largely avoided in deep networks, mainly for computational tractability reasons. To the best of our knowledge, the method of Pathak et al. is the only prior work that addresses constrained deep CNNs in weakly supervised segmentation. It uses the constraints to synthesize fully-labeled training masks (proposals) from weak labels, mimicking full supervision and facilitating dual optimization.
We propose to introduce a differentiable term, which enforces inequality constraints directly in the loss function, avoiding expensive Lagrangian dual iterates and proposal generation. From constrained-optimization perspective, our simple approach is not optimal as there is no guarantee that the constraints are satisfied. However, surprisingly, it yields substantially better results than the proposal-based constrained CNNs in Pathak et al., while reducing the computational demand for training. In the context of cardiac image segmentation, we reached a segmentation performance close to full supervision while using a fraction of the ground-truth labels 0.1% of the pixels of the ground-truth masks) and image-level tags. Our framework can be easily extended to other inequality constraints, e.g., shape moments or region statistics. Therefore, it has the potential to close the gap between weakly and fully supervised learning in semantic medical image segmentation. Our code is publicly available.
Histopathology image analysis serves as the gold standard for diagnosis of cancer and is directly related to the subsequent therapeutic treatment. However, pixel-wise delineated annotations on whole slide images (WSIs) are time-consuming and tedious, which poses difficulties in building a large-scale training dataset. How to effectively utilize available whole slide image-level label, which can be easily acquired, for deep learning is quite appealing. The main barrier on this task is due to the heterogeneous patterns in fine magnification level but only the WSI-level labels are provided. Furthermore, a gigapixel scale WSI can not be easily analysed due to the immeasurable computational cost. In this paper, we propose a weakly supervised approach for fast and effective classification on whole slide lung cancer images. Our method takes advantage of a patch-based fully convolutional network for discriminative block retrieval. Furthermore, context-aware feature selection and aggregation strategies are proposed to generate globally holistic WSI descriptor. Extensive experiments demonstrate that our method outperforms state-of-the-art methods by a large margin with accuracy of 97.1%. In addition, we highlight that a small number of available coarse annotations can contribute to further accuracy improvement. We believe that deep learning has great potential to assist pathologists for histology image diagnosis in the near future.
Computer aided diagnosis (CAD) systems are designed to assist clinicians in various tasks, including highlighting abnormal regions in medical images. Common methods exploit supervised learning using annotated data sets and perform classification at voxel-level. However, many pathologies are characterized by subtle lesions that may be located anywhere in the organ of interest, have various shapes, sizes and textures. Acquiring a data set adequately representing the heterogeneity of such pathologies is therefore a major issue. Moreover, when a lesion is not visually detected on a scan, outlining it accurately is not feasible. Performing supervised learning on such labeled data would not be reliable. In this study, we consider the problem of detecting subtle epilepsy lesions in multiparametric (T1w, FLAIR) MRI exams considered as normal (MRI-negative). We cast this problem as an outlier detection problem and build on a previously proposed approach that consists in learning a oc-SVM model for each voxel in the brain volume using a small number of clinically-guided features. Our goal in this study is to make a step forward by replacing the handcrafted features with automatically learnt representations using neural networks. We propose a novel version of siamese networks trained on patches extracted from healthy patients’ scans only. This network, composed of stacked convolutional autoencoders as subnetworks, is regularized by the reconstruction error of the patches. It is designed to map patches centered at the same spatial localization to ’close’ representations with respect to the chosen metric (i.e. cosine) in a latent space. Finally, the middle layer representations of the subnetworks are fed into oc-SVM models at voxel-level. The model is trained on 75 healthy subjects and validated on 21 patients with confirmed epilepsy lesions (with 18 MR negative patients) and shows a promising performance.
Performance of designed CAD algorithms for histopathology image analysis is affected by the amount of variations in the samples such as color and intensity of stained images. Stain-color normalization is a well-studied technique for compensating such effects at the input of CAD systems. In this paper, we introduce unsupervised generative neural networks for performing stain-color normalization. For color normalization in stained hematoxylin and eosin (H&E) images, we present three methods based on three frameworks for deep generative models: variational auto-encoder (VAE), generative adversarial networks (GAN) and deep convolutional Gaussian mixture models (DCGMM). Our contribution is defining the color normalization as a learning generative model that is able to generate various color copies of the input image through a nonlinear parametric transformation. In contrast to earlier generative models proposed for stain-color normalization, our approach does not need any labels for data or any other assumptions about the H&E image content. Furthermore, our models learn a parametric transformation during training and can convert the color information of an input image to resemble any arbitrary reference image. This property is essential in time-critical CAD systems in case of changing the reference image, since our approach does not need retraining in contrast to other proposed generative models for stain-color normalization. Experiments on histopathological H&E images with high staining variations, collected from different laboratories, show that our proposed models outperform quantitatively state-of-the-art methods in the measure of color constancy with at least 10-15%, while the converted images are visually in agreement with this performance improvement.
Lesion detection in brain Magnetic Resonance Images (MRI) remains a challenging task. State-of-the-art approaches are mostly based on supervised learning making use of large annotated datasets. Human beings, on the other hand, even non experts, can detect most abnormal lesions after seeing a handful of healthy brain images. Replicating this capability of using prior information on the appearance of healthy brain structure to detect lesions can help computers achieve human level abnormality detection, specifically reducing the need for number of labeled examples and better generalization to previously unseen lesions. To this end, we study detection of lesion regions in an unsupervised manner by learning data distribution of brain MRI of healthy subjects using auto-encoder based methods. We hypothesize that one of the main limitations of the current models is the lack of consistency in latent representation. We propose a simple yet effective constraint that helps mapping of an image bearing lesion close to its corresponding healthy image in the latent space. We use the Human Connectome Project dataset to learn distribution of healthy appearing brain MRI and report improved detection, in terms of AUC, of the lesions in the BRATS challenge dataset.
NeuroNet is a deep convolutional neural network mimicking multiple popular and state-of-the-art brain segmentation tools including FSL, SPM, and MALPEM. The network is trained on 5,000 T1-weighted brain MRI scans from the UK Biobank Imaging Study that have been automatically segmented into brain tissue and cortical and sub-cortical structures using the standard neuroimaging pipelines. Training a single model from these complementary and partially overlapping label maps yields a new powerful “”all-in-one”, multi-output segmentation tool. The processing time for a single subject is reduced by an order of magnitude compared to running each individual software package. We demonstrate very good reproducibility of the original outputs while increasing robustness to variations in the input data. We believe NeuroNet could be an important tool in large-scale population imaging studies and serve as a new standard in neuroscience by reducing the risk of introducing bias when choosing a specific software package.
Recent advances in cancer immunotherapy have boosted the interest in the role played by the immune system in cancer treatment. In particular, the presence of tumor-infiltrating lymphocytes (TILs) have become a central research topic in oncology and pathology. Consequently, a method to automatically detect and quantify immune cells is of great interest. In this paper, we present a comparison of different deep learning (DL) techniques for the detection of lymphocytes in immunohistochemically stained (CD3 and CD8) slides of breast, prostate and colon cancer. The compared methods cover the state-of-the-art in object localization, classification and segmentation: Locality Sensitive Method (LSM), U-net, You Only Look Once (YOLO) and fully-convolutional networks (FCNN). A dataset with 109,841 annotated cells from 58 whole-slide images was used for this study. Overall, U-net and YOLO achieved the highest results, with an F1-score of 0.78 in regular tissue areas. U-net approach was more robust to biological and staining variability and could also handle staining and tissue artifacts.
Computationally synthesized blood vessels can be used for training and evaluationof medical image analysis applications. We propose a deep generative model to synthesize blood vessel geometries, with an application to coronary arteries in cardiac CT angiography (CCTA).
In the proposed method, a Wasserstein generative adversarial network (GAN) consisting of a generator and a discriminator network is trained. While the generator tries to synthesize realistic blood vessel geometries, the discriminator tries to distinguish synthesized geometries from those of real blood vessels. Both real and synthesized blood vessel geometries are parametrized as 1D signals based on the central vessel axis. The generator can optionally be provided with an attribute vector to synthesize vessels with particular characteristics.
The GAN was optimized using a reference database with parametrizations of 4,412 real coronary artery geometries extracted from CCTA scans. After training, plausible coronary artery geometries could be synthesized based on random vectors sampled from a latent space. A qualitative analysis showed strong similarities between real and synthesized coronary arteries. A detailed analysis of the latent space showed that the diversity present in coronary artery anatomy was accurately captured by the generator.
Results show that Wasserstein generative adversarial networks can be used to synthesize blood vessel geometries.
In this work we propose an adversarial learning approach to generate high resolution MRI scans from low resolution images. The architecture, based on the SRGAN model, adopts 3D convolutions to exploit volumetric information. For the discriminator, the adversarial loss uses least squares in order to stabilize the training. For the generator, the loss function is a combination of a least squares adversarial loss and a content term based on mean square error and image gradients in order to improve the quality of the generated images. We explore different solutions for the upsampling phase. We present promising results that improve classical interpolation, showing the potential of the approach for 3D medical imaging super-resolution.
The well-documented global shortage of radiologists is most acutely manifested in countries where the rapid rise of a middle class has created a new capacity to produce imaging studies at a rate which far exceeds the time required to train experts capable of interpreting such studies. The production to interpretation gap is seen clearly in the case of the most common of imaging studies: the chest x-ray, where technicians are increasingly called upon to not only acquire the image, but also to interpret it. The dearth of expert radiologists leads to both delayed and inaccurate diagnostic insights. The present study utilizes a robust radiology database, machine-learning technologies, and robust clinical validation to produce expert-level automatic interpretation of routine chest x-rays. Using a convolutional neural network (CNN) we achieve a performance which is slightly higher than radiologists in the detection of four common chest X-ray (CXR) findings which include focal lung opacities, diffuse lung opacity, cardiomegaly, and abnormal hilar prominence. The agreement of the algorithm vs. radiologists is slightly higher (1-7\%) than the agreement among a team of three expert radiologists.
The use of ultrasound guidance in prostate cancer radiotherapy workflows is not widespread. This can be partially attributed to the need for image interpretation by a trained operator during ultrasound image acquisition. In this work, a one-class regressor, based on DenseNet and Gaussian processes, was implemented to assess automatically the quality of transperineal ultrasound images of the male pelvic region. The implemented deep learning approach achieved a scoring accuracy of 94%, a specificity of 95% and a sensitivity of 93% with respect to the majority vote of three experts, which was comparable with the results of these experts. This is the first step towards a fully automatic workflow, which could potentially remove the need for image interpretation and thereby make the use of ultrasound imaging, which allows real-time volumetric organ tracking in the RT environment, more appealing for hospitals.
In this paper, a new deformable image registration method based on a fully connected neural network is proposed. Even though a deformation field related to the point correspondence between fixed and moving images are high-dimensional in nature, we assume that these deformation fields form a low dimensional manifold in many real world applications. Thus, in our method, a neural network generates an embedding of the deformation field from a low dimensional vector. This low-dimensional manifold formulation avoids the intractability associated with the high dimensional search space that most other methods face during image registration. As a result, while most methods rely on explicit and handcrafted regularization of the deformation fields, our algorithm relies on implicitly regularizing the network parameters. The proposed method generates deformation fields from latent low dimensional space by minimizing a dissimilarity metric between a fixed image and a warped moving image. Our method removes the need for a large dataset to optimize the proposed network. The proposed method is quantitatively evaluated using images from the MICCAI ACDC challenge. The results demonstrate that the proposed method improves performance in comparison with a moving mesh registration algorithm, and also it correlates well with independent manual segmentations by an expert.
Uncertainty estimates of modern neuronal networks provide additional information next to the computed predictions and are thus expected to improve the understanding of the underlying model. Reliable uncertainties are particularly interesting for safety-critical computer-assisted applications in medicine, e.g., neurosurgical interventions and radiotherapy planning. We propose an uncertainty-driven sanity check for the identification of segmentation results that need particular expert review. Our method uses a fully-convolutional neural network and computes uncertainty estimates by the principle of Monte Carlo dropout. We evaluate the performance of the proposed method on a clinical dataset with 30 postoperative brain tumor images. The method can segment the highly inhomogeneous resection cavities accurately (Dice coefficients 0.792 ± 0.154). Furthermore, the proposed sanity check is able to detect the worst segmentation and three out of the four outliers. The results highlight the potential of using the additional information from the model’s parameter uncertainty to validate the segmentation performance of a deep learning model.
Automatic multi-organ segmentation of the dual energy computed tomography (DECT) data is beneficial for biomedical research and clinical applications. Numerous recent researches in medical image processing show the feasibility to use 3-D fully convolutional networks (FCN) for voxel-wise dense predictions of medical images. In the scope of this work, three 3D-FCN-based algorithmic approaches for the automatic multi-organ segmentation in DECT are developed. Both of the theoretical benefit and the practical performance of these novel deep-learning-based approaches are assessed. The approaches were evaluated using 26 torso DECT data acquired with a clinical dual-source CT system. Six thoracic and abdominal organs (left and right lungs, liver, spleen, and left and right kidneys) were evaluated using a cross-validation strategy. In all the tests, we achieved the best average Dice coefficients of 98% for the right lung, 97% for the left lung, 93% for the liver, 91% for the spleen, 94% for the right kidney, 92% for the left kidney, respectively. Successful tests on special clinical cases reveal the high adaptability of our methods in the practical application. The results show that our methods are feasible and promising.
Ki67 is an important biomarker for breast cancer. Classification of positive and negative Ki67 cells in histology slides is a common approach to determine cancer proliferation status. However, there is a lack of generalizable and accurate methods to automate Ki67 scoring in large-scale patient cohorts. In this work, we have employed a novel deep learning technique based on hypercolumn descriptors for cell classification in Ki67 images. Specifically, we developed the Simultaneous Detection and Cell Segmentation (DeepSDCS) network to perform cell segmentation and detection. VGG16 network was used for the training and fine tuning to training data. We extracted the hypercolumn descriptors of each cell to form the vector of activation from specific layers to capture features at different granularity. Features from these layers that correspond to the same pixel were propagated using a stochastic gradient descent optimizer to yield the detection of the nuclei and the final cell segmentations. Subsequently, seeds generated from cell segmentation were propagated to a spatially constrained convolutional neural network for the classification of the cells into stromal, lymphocyte, Ki67-positive cancer cell, and Ki67-negative cancer cell. Cells were subsequently classified using a spatially constrained network. We validated its accuracy in the context of a large-scale clinical trial of oestrogen-receptor-positive breast cancer. We achieved 99.06% and 89.59% accuracy on two separate test sets of Ki67 stained breast cancer dataset comprising biopsy and whole-slide images.
Catheters are commonly inserted life supporting devices. X-ray images are used to assess the position of a catheter immediately after placement as serious complications can arise from malpositioned catheters. Previous computer vision approaches to detect catheters on X-ray images either relied on low-level cues that are not sufficiently robust or only capable of processing a limited number or type of catheters. With the resurgence of deep learning, supervised training approaches are begining to showing promising results. However, dense annotation maps are required, and the work of a human annotator is hard to scale. In this work, we proposed a simple way of synthesizing catheters on X-ray images and a scale recurrent network for catheter detection. By training on adult chest X-rays, the proposed network exhibits promising detection results on pediatric chest/abdomen X-rays in terms of both precision and recall.
Fast and accurate anatomical landmark detection can benefit many medical image analysis methods. Here, we propose a method to automatically detect anatomical landmarks in medical images. Automatic landmark detection is performed with a patch-based fully convolutional neural network (FCNN) that combines regression and classification. For any given image patch, regression is used to predict the 3D displacement vector from the image patch to the landmark. Simultaneously, classification is used to identify patches that contain the landmark. Under the assumption that patches close to a landmark can determine the landmark location more precisely than patches farther from it, only those patches that contain the landmark according to classification are used to determine the landmark location. The landmark location is obtained by calculating the average landmark location using the computed 3D displacement vectors. The method is evaluated using detection of six clinically relevant landmarks in coronary CT angiography (CCTA) scans : the right and left ostium, the bifurcation of the left main coronary artery (LM) into the left anterior descending and the left circumflex artery, and the origin of the right, non-coronary, and left aortic valve commissure. The proposed method achieved an average Euclidean distance error of 2.19 mm and 2.88 mm for the right and left ostium respectively, 3.78 mm for the bifurcation of the LM, and 1.82 mm, 2.10 mm and 1.89 mm for the origin of the right, non-coronary, and left aortic valve commissure respectively, demonstrating accurate performance. The proposed combination of regression and classification can be used to accurately detect landmarks in CCTA scans.
Head motion during MRI acquisition presents significant problems for subsequent neuroimaging analyses. In this work, we propose to use convolutional neural networks (CNNs) to correct motion-corrupted images as well as investigate a possible improvement by augmenting L1 loss with adversarial loss. For training, in order to gain access to a ground-truth, we first selected a large number of motion-free images from the ABIDE dataset. We then added simulated motion artifacts on these images to produce motion corrupted data and a 3D regression CNN was trained to predict the motion-free volume as the output. We tested the CNN on unseen simulated data as well as real motion affected data. Quantitative evaluation was carried out using metrics such as Structural Similarity (SSIM) index, Correlation Coefficient (CC), and Tissue Contrast T-score (TCT). It was found that Gaussian smoothing as a conventional method did not significantly differ in SSIM, CC and RMSE from the uncorrected data. On the other hand, the two CNN models successfully removed the motion-related artifact as their SSIM and CC significantly increased after their correction and the error was reduced. The CNN displayed significantly larger TCT compared to the uncorrected images whereas the adversarial network, while improved did not show a significantly increased TCT, which may be explained also by its over-enhancement of edges. Our results suggest that the proposed CNN framework enables the network to generalize well to both unseen simulated motion artifacts as well as real motion artifact-affected data. The proposed method could easily be adapted to estimate a motion severity score, which could be used as a score of quality control or as a nuisance covariate in subsequent statistical analyses.
Measurement of biometrics from fetal ultrasound (US) images is of key importance in monitoring healthy fetal development. Under the time-constraints of a clinical setting however, accurate measurement of relevant anatomical structures, including abdominal circumference (AC), is subject to large inter-observer variability. To address this, an automated method is proposed to annotate the abdomen in 2D US images and measure AC using a shape-aware, multi-task deep convolutional neural network in a cascaded model framework. The multi-task loss simultaneously optimises both pixel-wise segmentation and shape parameter regression. We also introduce a cascaded shape-based transformation to normalise for position and orientation of the anatomy, improving results further on challenging images. Models were trained using approximately 1700 abdominal images and compared to inter-expert variability on 100 test images. The proposed model performs better than inter-expert variability in terms of mean absolute error for AC measurements (2.60mm vs 5.89mm), and Dice score (0.962 vs 0.955). We also show that on the most challenging test images, the proposed method significantly improves on the baseline model, while running at 8fps which could aid clinical workflow.
In this work, we apply an attention-gated network to real-time automated scan plane detection for fetal ultrasound screening. Scan plane detection in fetal ultrasound is a challenging problem due the poor image quality resulting in low interpretability for both clinicians and automated algorithms. To solve this, we propose incorporating self-gated soft-attention mechanisms. A soft-attention mechanism generates a gating signal that is end-to-end trainable, which allows the network to contextualise local information useful for prediction. The proposed attention mechanism is generic and it can be easily incorporated into any existing classification architectures, while only requiring a few additional parameters. We show that, when the base network has a high capacity, the incorporated attention mechanism can provide efficient object localisation while improving the overall performance. When the base network has a low capacity, the method greatly outperforms the baseline approach and significantly reduces false positives. Lastly, the generated attention maps allow us to understand the model’s reasoning process, which can also be used for weakly supervised object localisation.
The variations in multi-center data in medical imaging studies have brought the necessity of domain adaptation. Despite the advancement of machine learning in automatic segmentation, performance often degrades when algorithms are applied on new data acquired from different scanners or sequences than the training data. Manual annotation is costly and time consuming if it has to be carried out for every new target domain. In this work, we investigate automatic selection of suitable subjects to be annotated for supervised domain adaptation using the concept of reverse classification accuracy (RCA). RCA predicts the performance of a trained model on data from the new domain and different strategies of selecting subjects to be included in the adaptation via transfer learning are evaluated. We perform experiments on a two-center MR database for the task of organ segmentation. We show that subject selection via RCA can reduce the burden of annotation of new data for the target domain.
Magnetic Resonance Angiography (MRA) has become an essential MR contrast for imaging and evaluation of vascular anatomy and related diseases. MRA acquisitions are typically ordered for vascular interventions, whereas in typical scenarios, MRA sequences can be absent in the patient scans. This motivates the need for a technique that generates inexistent MRA from existing MR multi-contrast, which could be a valuable tool in retrospective subject evaluations and imaging studies. In this paper, we present a generative adversarial network (GAN) based technique to generate MRA from T1-weighted and T2-weighted MRI images, for the first time to our knowledge. To better model the representation of vessels which the MRA inherently highlights, we design a loss term dedicated to a faithful reproduction of vascularities. To that end, we incorporate steerable filter responses of the generated and reference images inside a Huber function loss term. Extending the well- established generator-discriminator architecture based on the recent PatchGAN model with the addition of steerable filter loss, the proposed steerable GAN (sGAN) method is evaluated on the large public database IXI. Experimental results show that the sGAN outperforms the baseline GAN method in terms of an overlap score with similar PSNR values, while it leads to improved visual perceptual quality.
The placenta is a complex organ, playing multiple roles during fetal development. Very little is known about the association between placental morphological abnormalities and fetal physiology. In this work, we present an open sourced, computationally tractable deep learning pipeline to analyse placenta histology at the level of the cell. By utilising two deep Convolutional Neural Network architectures and transfer learning, we can robustly localise and classify placental cells within five classes with an accuracy of 89%. Furthermore, we learn deep embeddings encoding phenotypic knowledge that is capable of both stratifying five distinct cell populations and learn intraclass phenotypic variance. We envisage that the automation of this pipeline to population scale studies of placenta histology has the potential to improve our understanding of basic cellular placental biology and its variations, particularly its role in predicting adverse birth outcomes.
Coronary CT angiography has become a preferred technique for the detection and diagnosis of coronary artery disease, but image artifacts due to cardiac motion frequently interfere with evaluation. Several motion compensation approaches have been developed which deal with motion estimation based on 3-D/3-D registration of multiple heart phases. The scan range required for multi-phase reconstruction is a limitation in clinical practice. In this paper, the feasibility of single-phase, image-based motion estimation by convolutional neural networks (CNNs) is investigated. First, the required data for supervised learning is generated by a forward model which introduces simulated axial motion to artifact-free CT cases. Second, regression networks are trained to estimate underlying 2D motion vectors from axial coronary cross-sections. In a phantom study with computer-simulated vessels, CNNs predict the motion direction and the motion strength with average accuracies of 1.08° and 0.06 mm, respectively. Motivated by these results, clinical performance is evaluated based on twelve prospectively ECG-triggered clinical cases and achieves average accuracies of 20.66° and 0.94 mm. Transferability and generalization capabilities are demonstrated by motion estimation and subsequent compensation on six clinical cases with real cardiac motion artifacts.
Being responsible for over 50,000 death per year within the U.S. alone, colorectal cancer (CRC) is the second leading cause of cancer related deaths in industry nations with increasing prevalence. Within the scope of personalized medicine, precise estimates on future progress are crucial. We thus propose a novel deep learning based system using deep convolutional sparse autoencoders for estimating future lesion growth for CRC liver lesions based on single slice CT tumor images for early therapy assessment. Furthermore, we show that our system can be used for one-year survival prediction in CRC patients. While state of the art treatment assessment (RECIST) is premised on retrospective lesion analysis, our proposed system delivers an estimate on future response, thus prospectively allowing to adapt therapy before further progress. We compare our system to single-lesion assessment through RECIST diameter and Radiomics. With our approach we archieve a phi-coefficient of 40.0% compared to 27.3% / 29.4% and an AUC of .784 vs .744/.737 for growth prediction, as well as a phi-coefficient of 44.9% vs 32.1% / 18.0% and an AUC of .710 vs. .688/.568 for survival prediction.
We introduce MURA, a large dataset of musculoskeletal radiographs containing 40,562 images from 14,864 studies, where each study is manually labeled by radiologists as either normal or abnormal. On this dataset, we train a 169-layer densely connected convolutional network to detect and localize abnormalities. To evaluate our model robustly and to get an estimate of radiologist performance, we collect additional labels from six board-certified Stanford radiologists on the test set, consisting of 207 musculoskeletal studies. On this test set, the majority vote of a group of three radiologists serves as gold standard. The model achieves an AUROC of 0.929, with an operating point of 0.815 sensitivity and 0.887 specificity. We also compare our model and radiologists on the Cohen’s kappa statistic, which expresses the agreement of our model and of each radiologist with the gold standard. We find that our model achieves performance comparable to that of radiologists. Model performance is comparable to the best radiologist performance in detecting abnormalities on finger and wrist studies. However, model performance is lower than best radiologist performance in detecting abnormalities on elbow, forearm, hand, humerus, and shoulder studies, indicating that the task is a good challenge for future research. To encourage advances, we have made our dataset freely available at http://stanfordmlgroup.github.io/competitions/mura.
Segmenting vascular pathologies such as white matter lesions in Brain magnetic resonance images (MRIs) require acquisition of multiple sequences such as T1-weighted (T1-w) –on which lesions appear hypointense– and fluid attenuated inversion recovery (FLAIR) sequence –where lesions appear hyperintense–. However, most of the existing retrospective datasets do not consist of FLAIR sequences. Existing missing modality imputation methods separate the process of imputation, and the process of segmentation. In this paper, we propose a method to link both modality imputation and segmentation using convolutional neural networks. We show that by jointly optimizing the imputation network and the segmentation network, the method not only produces more realistic synthetic FLAIR images from T1-w images, but also improves the segmentation of WMH from T1-w images only.
In this paper, we propose a predictive regression model for longitudinal images with missing data based on large deformation diffeomorphic metric mapping (LDDMM) and deep neural networks. Instead of directly predicting image scans, our model predicts a vector momentum sequence associated with a baseline image. This momentum sequence parameterizes the original image sequence in the LDDMM framework and lies in the tangent space of the baseline image, which is Euclidean. A recurrent network with long term-short memory (LSTM) units encodes the time-varying changes in the vector-momentum sequence, and a convolutional neural network (CNN) encodes the baseline image of the vector momenta. Features extracted by the LSTM and CNN are fed into a decoder network to reconstruct the vector momentum sequence, which is used for the image sequence prediction by deforming the baseline image with LDDMM shooting. To handle the missing images at some time points, we adopt a binary mask to ignore their reconstructions in the loss calculation. We evaluate our model on synthetically generated images and the brain MRIs from the OASIS dataset. Experimental results demonstrate the promising predictions of the spatiotemporal changes in both datasets, irrespective of large or subtle changes in longitudinal image sequences.
Different types of atherosclerotic plaque and varying grades of stenosis lead to different management of patients with obstructive coronary artery disease. Therefore, it is crucial to determine the presence and classify the type of coronary artery plaque, as well as to determine the presence and the degree of a stenosis. The study includes consecutively acquired coronary CT angiography (CCTA) scans of 131 patients. In these, presence and plaque type in the coronary arteries (no plaque, non-calcified, mixed, calcified) as well as presence and anatomical significance of coronary stenosis (no stenosis, non-significant, significant) were manually annotated by identifying the start and end points of the fragment of the artery affected by the plaque. To perform automatic analysis, a multi-task recurrent convolutional neural network is utilized. The network uses CCTA and coronary artery centerline as its inputs, and extracts features from the region defined along the coronary artery centerline using a 3D convolutional neural network. Subsequently, the extracted features are used by a recurrent neural network that performs two simultaneous multi-label classification tasks. In the first task, the network detects and characterizes the type of the coronary artery plaque. In the second task, the network detects and determines the anatomical significance of the coronary artery stenosis. The results demonstrate that automatic characterization of coronary artery plaque and stenosis with high accuracy and reliability is feasible. This may enable automated triage of patients to those without coronary plaque, and those with coronary plaque and stenosis in need for further cardiovascular workup.
We consider the problem of identifying the patients who are diagnosed with high- grade prostate cancer using the histopathology of tumor in a prostate needle biopsy and are at a very high risk of lethal cancer progression. We hypothesize that the morphology of tumor cell nuclei in digital images from the biopsy can be used to predict tumor aggressiveness and posit the presence of metastasis as a surrogate for disease specific mortality. For this purpose, we apply a compositional multi- instance learning approach which encodes images of nuclei through a convolutional neural network, then predicts the presence of metastasis from sets of encoded nuclei. Through experiments on prostate needle biopsies (PNBX) from a patient cohort with known presence (M1 stage, n = 85) or absence (M0 stage, n = 86) of metastatic disease, we obtained an average area under the receiver operating characteristic curve of 0.71 ± 0.08 for predicting metastatic cases. These results support our hypothesis that information related to metastatic capacity of prostate cancer cells can be obtained through analysis of nuclei and establish a baseline for future research aimed at predicting the risk of future metastatic disease at a time when it might be preventable.
We present extraction of tree structures, such as airways, from image data as a graph refinement task. To this end, we propose a graph auto-encoder model that uses an encoder based on graph neural networks (GNNs) to learn embeddings from input node features and a decoder to predict connections between nodes. Performance of the GNN model is compared with mean-field networks in their ability to extract airways from 3D chest CT scans.
Occasionally even the best automated method fails due to low image quality, artifacts or unexpected behaviour of black box algorithms. Being able to predict segmentation quality in the absence of ground truth is of paramount importance in clinical practice, but also in large-scale studies to avoid the inclusion of invalid data in subsequent analysis.
In this work, we propose two approaches of real-time automated quality control for cardiovascular MR segmentations using deep learning. First, we train a neural network on 12,880 samples to predict Dice Similarity Coefficients (DSC) on a per-case basis. We report a mean average error (MAE) of 0.03 on 1,610 test samples and 97% binary classification accuracy for separating low and high quality segmentations. Secondly, in the scenario where no manually annotated data is available, we train a network to predict DSC scores from estimated quality obtained via a reverse testing strategy. We report an MAE = 0.14 and 91% binary classification accuracy for this case. Predictions are obtained in real-time which, when combined with real-time segmentation methods, enables instant feedback on whether an acquired scan is analyzable while the patient is still in the scanner.
Pulmonary hypertension (PH) is a life-threatening and rapidly progressive disease in which functional adaptation of the right ventricle (RV), as quantified by RV ejection fraction (RVEF), is a key prognostic marker. However, RVEF is largely insensitive to regional or early RV dysfunction which may improve prognostication and allow prompt identification of high-risk cases. Cardiac Magnetic Resonance (MR) imaging is a standard modality for quantification of RV function, and can be used to derive anatomically accurate, high-resolution 3D shape models of RV contraction using recently developed computational imaging analysis techniques. These time-resolved 3D models may lend additional insights beyond what is offered by simple conventional measures like RVEF. In this study, we train a deep survival network to predict mortality in PH patients by learning complex RV contraction patterns from 3D shape models of RV motion. To handle right-censored survival time outcomes, our network utilized a Cox proportional hazards partial likelihood loss function. The network was trained on imaging and mortality data on 148 PH patients. It yielded improved prediction accuracy and superior risk stratification, compared with a multivariable survival model consisting of RVEF and other conventional parameters of RV function. This study demonstrates the utility of deep learning for identification of prognostic spatio-temporal patterns in 3D models of RV motion.
Automatic Shadow Detection in 2D Ultrasound. Qingjie Meng, Christian Baumgartner, Matthew Sinclair, James Housden, Martin Rajchl, Alberto Gomez, Benjamin Hou, Nicolas Toussaint, Jeremy Tan, Jacqueline Matthew, Daniel Rueckert, Julia Schnabel, Bernhard Kainz, Imperial College London, UK. paper • abstract
Automatically detecting acoustic shadows is of great importance for automatic 2D ultrasound analysis ranging from anatomy segmentation to landmark detection. However, variation in shape and similarity in intensity to other structures in the image make shadow detection a very challenging task. In this paper, we propose an automatic shadow detection method to generate a pixel-wise shadow confidence map from weakly labelled annotations. Our method jointly uses; (1) a feature attribution map from a Wasserstein GAN and (2) an intensity saliency map from a graph cut model. The proposed method accurately highlights the shadow areas in two 2D ultrasound datasets comprising standard view planes as acquired during fetal screening. Moreover, the proposed method outperforms the state-of-the-art quantitatively and improves failure cases for automatic biometric measurement.
Recent advance of deep learning has been transforming the landscape in many domain, including health care. However, understanding the predictions of a deep network remains a challenge, which is especially sensitive in health care domains as interpretability is key.
Techniques that rely on saliency maps -highlighting the region of an image that influence the classifier’s decision the most- are often used for that purpose.
However, gradients fluctuation make saliency maps noisy ant thus difficult to interpret at a human level. Moreover, models tend to focus on one particular influential region of interest (ROI) in the image, even though other regions might be relevant for the decision.
We propose a new framework that refines those saliency maps to generate segmentation masks over the ROI on the initial image. In a second contribution, we propose to apply those masks over the original inputs, then evaluate our classifier on the masked inputs to identify previously unidentified ROI. This iterative procedure allows us to emphasize new region of interests by extracting meaningful information from the saliency maps.
Hourglass networks such as the U-Net and V-Net are popular neural architectures for medical image segmentation and counting problems. Typical instances of hourglass networks contain shortcut connections between mirroring layers. These shortcut connections improve the performance and it is hypothesized that this is due to mitigating effects on the vanishing gradient problem and the ability of the model to combine feature maps from earlier and later layers. We propose a method for not only combining feature maps of mirroring layers but also feature maps of layers with different spatial dimensions. For instance, the method enables the integration of the bottleneck feature map with those of the reconstruction layers. The proposed approach is applicable to any hourglass architecture. We evaluated the contextual hourglass networks on image segmentation and object counting problems in the medical domain. We achieve competitive results outperforming popular hourglass networks by up to 17 percentage points.
Deep learning has been recently applied to a multitude of computer vision and medical image analysis problems. Although recent research efforts have improved the state of the art, most of the methods cannot be easily accessed, compared or used by either researchers or the general public. Researchers often publish their code and trained models on the internet, but this does not always enable these approaches to be easily used or integrated in stand-alone applications and existing workflows.
In this paper we propose a framework which allows easy deployment and access of deep learning methods for segmentation through a cloud-based architecture.
Our approach comprises three parts: a server, which wraps trained deep learning models and their pre- and post-processing data pipelines and makes them available on the cloud; a client which interfaces with the server to obtain predictions on user data; a service registry that informs clients about available prediction endpoints that are available in the cloud. These three parts constitute the open-source TOMAAT framework.
Building large medical imaging datasets for image segmentation is a challenging task due to manual outlining. In this work, we explore the use of stereology to cut the costs of annotation. We train a segmentation model using a coarse point counting grid as the sole annotation and quantify the impact of this approach on segmentation performance. Results show that dense masks are not a strict requirement for training segmentation models to achieve satisfying performance. Since deciding whether a small set of grid points overlaps a structure of interest is an inherently faster operation than tracing a dense outline, this method allows to scale up volume annotation to large datasets.
We propose an unsupervised method using self-clustering convolutional adversarial autoencoders to classify prostate tissue as tumor or non-tumor without any labeled training data. The clustering method is integrated into the training of the autoencoder and requires only little post-processing. Our network trains on hematoxylin and eosin (H&E) input patches and we tested two different reconstruction targets, H&E and immunohistochemistry (IHC). We show that antibody-driven feature learning using IHC helps the network to learn relevant features for the clustering task. Our network achieves a F1 score of 0.62 using only a small set of validation labels to assign classes to clusters.
Lung nodule segmentation can help radiologists’ analysis of nodule risk. Recent deep learning based approaches have shown promising results in the segmentation task. However, a 3D segmentation map necessary for training the algorithms requires an expensive effort from expert radiologists. We propose a new method to train the deep neural network, only utilizing diameter information for each nodule. We validate our model with the LUNA16 dataset, showing competitive results compared to the previous state-of-the-art methods in various evaluation metrics. Our experiments also provide plausible qualitative results comparable to the ground truth segmentation.
The presence of a vein inside white matter lesions was recently proposed as an imaging biomarker that can help in the differential diagnosis of Multiple Sclerosis (MS), potentially reducing the challenging clinical-radiological gap. Here, we propose a prototype based on ensembling small 3D convolutional networks to classify perivenular (P+) and non-perivenular (P-) lesions. Even without prior lesion masking, our approach reaches performance superior to imaging filters designed specifically to detect blood vessels, and that have access to a lesion mask.
Long-TE gradient recalled-echo (GRE) scans are prone to phase artifacts due to B0 inhomogeneity. We propose a learning-based approach that does not rely on navigator readouts and allows to infer phase error offsets directly from corrupted data. Our method does not need to be pre-trained on a database of medical images that match a contrast/acquisition protocol of the input image. A sufficient input is a raw multi-coil spectrum of the image that needs to be corrected. We train a convolutional neural network to predict phase offsets for each k-space line of a 2D image. We synthesize training examples online by reconvolving the corrupted spectrum with point spread functions (PSFs) of the coil sensitivity profiles and superimposing artificial phase errors, which we attempt to predict. We evaluate our approach on “in vivo” data acquired with GRE sequence, and demonstrate an improvement in image quality after phase error correction.
We propose a semantic segmentation model for histopathology that exploits rotation and reflection symmetries inherent in histopathology images. We demonstrate significant performance gains due to increased weight sharing, as well as improvements in predictive stability. The group-equivariant CNN framework is extended for segmentation by introducing a new (G -> Z2)-convolution that transforms feature maps on a group to planar feature maps. In addition, equivariant transposed convolution is formulated for up-sampling in an encoder-decoder network. We further show the importance of exploiting more symmetries by varying the size of the group.
We developed a deep learning framework that helps to automatically identify and segment lung cancer areas in patients’ tissue specimens. The study was based on a cohort of lung cancer patients operated at the Uppsala University Hospital. The tissues were reviewed by lung pathologists and then the cores were compiled to tissue micro-arrays (TMAs). For experiments, hematoxylin-eosin stained slides from 712 patients were scanned and then manually annotated. Then these scans and annotations were used to train segmentation models of the developed framework. The performance of the developed deep learning framework was evaluated on fully annotated TMA cores from 178 patients reaching pixel-wise precision of 0.80 and recall of 0.86. Finally, publicly available Stanford TMA cores were used to demonstrate high performance of the framework qualitatively.
Pose estimation is an omnipresent problem in medical image analysis. Deep learning methods often parameterise a pose with a representation that separates rotation and translation, as commonly available frameworks do not provide means to calculate loss on a manifold. In this paper, we propose a general Riemannian formulation of the pose estimation problem and train CNNs directly on SE(3) equipped with a left-invariant Riemannian metric. At each training step; the loss is calculated as the Riemannian geodesic distance, with the gradients required for back-propagation calculated with respect to the predicted pose on the tangent space of the manifold SE(3). We thoroughly evaluate the effectiveness of our loss function by comparing its performance with popular and most commonly used existing methods, and show that it can improve registration accuracy for image-based 2D to 3D registration.
Limited capture range, and the requirement to provide high quality initialization for optimization-based 2D/3D image registration methods, can significantly degrade the performance of 3D image reconstruction and motion compensation pipelines. Challenging clinical imaging scenarios, which contain significant subject motion such as fetal in-utero imaging, complicate the 3D image and volume reconstruction process. In this paper we present a learning based image registration method using Convolutional Neural Networks (CNNs) to predicting 3D rigid transformations of arbitrarily oriented 2D image slices, with respect to a learned canonical atlas co-ordinate system. Only image slice intensity information is used to perform registration and canonical alignment. We extensively evaluate the effectiveness of our approach quantitatively on simulated Magnetic Resonance Imaging (MRI), fetal brain imagery with synthetic motion and further demonstrate qualitative results on real fetal MRI data where our method is integrated into a full reconstruction and motion compensation pipeline. Furthermore, we utilise Monte Carlo Dropout for the purpose of establishing a prediction confidence metric.
Standard scan plane detection in 3D fetal brain ultrasound (US) is a crucial step in the assessment of fetal brain development. We propose an automatic method for the detection of standard planes in 3D volumes by utilising a convolutional neural network (CNN) to learn the relationship between a 2D plane image and the transformation parameters required to move that plane towards the corresponding standard plane. In addition, we explore the effect of using two different training loss functions which exploit the geometric information and the image data of the extracted plane respectively. When evaluated on 72 subjects, our method achieves a plane detection error of 3.45 mm and 12.4 degrees.
Discrepancies between the chronological age of an individual and the neuroimaging based data driven “brain age” have been shown to be feasible biomarkers associated to a wide range of neurological disorders such as Alzheimer’s Disease, traumatic brain injuries or psychiatric disorders. We devised a framework based on Deep Gaussian Processes which achieves state-of-the-art results in terms of global brain age prediction. We also introduced the first ever attempt of predicting brain age at voxel-level using context-sensitive Random Forests. The resulting models provide feasible brain-predicted age estimates for younger to middle-aged subjects, with less reliable estimates for older subjects.
The high number of neurons and the complex segregation of the human brain based on cytoarchitecture require an automated analytics approach. Therefore, we are analyzing images of 1 μm resolution histological sections stained for cell bodies using deep learning. The severely limited training data for supervised brain region segmentation represents a challenge for such analysis. We solve it by learning a feature embedding from patches of cortex using a self-supervised Siamese network. The distance between two features corresponds to the distance of the respective image patch locations in the 3D anatomical space. In this contribution, we show that the learned features encode distinctive cytoarchitectonic attributes of the input patches, and form anatomically relevant clusters across the brain.
We discuss how distribution matching losses, such as those used in CycleGAN, when used to translate images from one domain to another can lead to mis-diagnosis of medical conditions. It seems appealing to use these methods for image translation from the source domain to the target domain without requiring paired data. However, the way these models function is through matching the distribution of the translated images to the target domain. This can cause issues especially when the percentage of known and unknown labels (e.g. sick and healthy labels) differ between the source and target domains. When the output of the model is an image, current methods do not guarantee that the known and unknown labels have been preserved. Therefore until alternative solutions are proposed to maintain the accuracy of the translated features, such translated images should not be used for medical interpretation (e.g. by doctors). However, recent papers are using these models as if this is the goal.
You Only Look on Lymphocytes Once. Mart van Rijthoven, Zaneta Swiderska-Chadaj, Katja Seeliger, Jeroen van der Laak, Francesco Ciompi, Radboud University Medical Centre, the Netherlands. paper • abstract
Understanding the role of immune cells is at the core of cancer research. In this paper, we boost the potential of the You Only Look Once (YOLO) architecture applied to automatic detection of lymphocytes in gigapixel histopathology whole- slide images (WSI) stained with immunohistochemistry by (1) tailoring the YOLO architecture to lymphocyte detection in WSI; (2) guiding training data sampling by exploiting prior knowledge on hard negative samples; (3) pairing the proposed sampling strategy with the focal loss technique. The combination of the proposed improvements increases the F1-score of YOLO by 3% with a speed-up of 4.3X.
Interpretability of deep neural networks in medical imaging is becoming an important technique to understand network classification decisions and increase doctors’ trust. Available methods for visual interpretation, though, tend to highlight only the most discriminant areas, which is suboptimal for clinical output. We propose a novel deep visualization framework for improving weakly-supervised lesion localization. The framework applies an iterative approach where, in each step, the interpretation maps focus on different, less discriminative areas of the images, but still important for the final classification, reaching a more refined localization of abnormalities. We evaluate the performance of the method for the localization of diabetic retinopathy lesions in color fundus images. The results show the obtained visualization maps are able to detect more lesions after the iterative procedure in the case of more severely affected retinas.
Multi-task learning is ideally suited for MR-only radiotherapy planning as it can jointly simulate a synthetic CT (synCT) scan – a regression task – and an automated contour of organs-at-risk – a segmentation task – from MRI data. We propose to use a probabilistic deep-learning model to estimate respectively the intrinsic and parameter uncertainty. Intrinsic uncertainty is estimated through a heteroscedastic noise model whilst parameter uncertainty is modelled using approximate Bayesian inference. This provides a mechanism for data-driven adaptation of task losses on a voxel-wise basis and importantly, a measure of uncertainty over the prediction of both tasks. We achieve state-of-the-art performance in the regression and segmentation of prostate cancer scans. We show that automated estimates of uncertainty correlate strongly in areas prone to errors across both tasks, which can be used as mechanism for quality control in radiotherapy treatment planning.
In this paper we propose a patch sampling strategy based on sequential Monte-Carlo methods for Whole Slide Image classification in the context of Multiple Instance Learning and show its capability to achieve high generalization performance on the differentiation between sun exposed and not sun exposed pieces of skin tissue.
A data-efficient Deep Learning method is presented to explore outcome prediction in Ischemic Stroke using full-sized 2D CT images. We show promising results on 3 different prediction tasks with equal or higher performance than conventional CNNs while reducing model-parameters and overfitting on limited data sets.
We propose a framework for rotation and translation covariant deep learning using SE(2) group convolutions. The group product of the special Euclidean motion group SE(2) describes how a concatenation of two roto-translations results in a net roto-translation. We encode this geometric structure into convolutional neural networks (CNNs) via SE(2) group convolutional layers.
We introduce three layers: a lifting layer which lifts a 2D (vector valued) image to an SE(2)-image, i.e., 3D (vector valued) data whose domain is SE(2); a group convolution layer from and to an SE(2)-image; and a projection layer from an SE(2)-image to a 2D image.
The lifting and group convolution layers are SE(2) covariant (the output roto-translates with the input).
The final projection layer, a maximum intensity projection over rotations, makes the full CNN rotation invariant.
We show with three different problems in histopathology, retinal imaging, and electron microscopy that with the proposed group CNNs, state-of-the-art performance can be achieved, without the need for data augmentation by rotation and with increased performance compared to standard CNNs that do rely on augmentation.
Deep learning methods have shown impressive results for a variety of medical problems over the last few years. However, datasets tend to be small due to time-consuming annotation. As datasets with different patients are often very heterogeneous generalization to new patients can be difficult. This is complicated further if large differences in image acquisition can occur, which is common during intravascular optical coherence tomography for coronary plaque imaging. We address this problem with an adversarial training strategy where we force a part of a deep neural network to learn features that are independent of patient- or acquisition-specific characteristics. We compare our regularization method to typical data augmentation strategies and show that our approach improves performance for a small medical dataset.
Normally, lesions are detected using supervised learning techniques that require labelled training data. We explore the use of Bayesian autoencoders to learn the variability of healthy tissue and detect lesions as unlikely events under the normative model. As a proof-of-concept, we test our method on registered 2D mid- axial slices from CT imaging data.Our results indicate that our method achieves best performance in detecting lesions caused by bleeding compared to baselines.
Automatic segmentation of the liver, spleen and both kidneys is an important problem allowing to achieve accurate clinical diagnosis and to improve computer- aided decision support systems. This work presents a computational methods for automatic segmentation of liver, spleen, left and right kidney in abdominal CT images using deep convolutional neural networks (CNN) which allow the accurate segmentation of large-scale medical trials. Moreover this work demonstrates the comparison of several CNN based approaches to perform the segmentation of required organs. Validation results on the given dataset show that U-Net based liver, spleen and both kidneys segmentation for transaxial slicing achieves mean Dice similarity scores (DSC) of 94%, 89% and 88% respectively.
We propose a novel two-step methodology for entire whole-slide image (WSI) classification. First, all tissue patches in a WSI are mapped into vector embeddings using an encoder trained in an unsupervised fashion. The spatial arrangement of these embeddings is maintained with respect to the tissue patches, forming a stack of 2D feature maps representing the WSI. Second, a convolutional neural network is trained on these compact representations to predict weak labels associated with entire WSIs. We investigated several unsupervised schemes to train the encoder model: convolutional autoencoders (CAE), variational autoencoders (VAE), and a novel approach based on contrastive training. We validated the proposed methodology by predicting the existence of tumor metastasis at WSI-level using the Camelyon16 dataset. Our experimental results showed that the proposed methodology can be used to predict weak labels from entire WSIs. Furthermore, the novel contrastive encoder proved to be superior to the CAE and VAE approaches.
To train deep convolutional neural networks, the input data and the intermediate activations need to be kept in memory to calculate the gradient descent step. Given the limited memory available in the current generation accelerator cards, this limits the maximum dimensions of the input data. We demonstrate a method to train convolutional neural networks holding only parts of the image in memory while giving equivalent results. We quantitatively compare this new way of training convolutional neural networks with conventional training. In addition, as a proof of concept, we train a convolutional neural network with 64 megapixel images, which requires 97% less memory than the conventional approach.
We compared the diagnostic performance of convolutional neural network (CNN) in diagnosing maxillary sinusitis on Waters’ view radiograph with those of five radiologists using temporal and geographic external test sets. In the temporal external test set, area under the receiver operating characteristic curves (AUC) of CNN was 0.93, which was comparable with AUCs of radiologists which ranged 0.83–0.89. In the geographic external test set, AUC of CNN was 0.88, which was comparable with AUCs of the radiologists which ranged 0.75–0.84. The CNN can diagnose maxillary sinusitis on Waters’ view radiograph as accurately as the expert radiologists.
Medical image segmentation is often constrained by the availability of labelled training data. ‘Data augmentation’ helps to prevent memorisation of training data and helps the network’s performance on data from outside the training set. As such, it is vital in building robust deep learning pipelines. Augmentation in medical imaging typically involves applying small transformations to images during training to create variety. However, it is also possible to use linear combinations of training images and labels to augment the dataset using the recently-proposed ‘mixup’ algorithm. Here, we apply this algorithm for use in medical imaging segmentation. We show that it increases performance in segmentation tasks, and also offer a theoretical suggestion for the efficacy of this technique.
In this paper, we hypothesize that morphological properties of nuclei are crucial for classifying dysplastic changes. Therefore, we propose to represent a whole histopathology slide as a collection of smaller images containing patches of nuclei and adjacent tissue. For this purpose, we use a deep multiple instance learning approach. Within this framework we first embed patches in a low-dimensional space using convolutional and fully-connected layers. Next, we combine the low-dimensional embeddings using a multiple instance learning pooling operator and eventually we use fully-connected layers to provide a classification. We evaluate our approach on esophagus cancer histopathology dataset.