Machine-learning Support to Individual Diagnosis of Mild Cognitive Impairment Using Multimodal MRI and Cognitive Assessments

Background: Understanding whether the cognitive profile of a patient indicates mild cognitive impairment (MCI) or performance levels within normality is often a clinical challenge. The use of resting-state functional magnetic resonance imaging (RS-fMRI) and machine learning may represent valid aids in clinical settings for the identification of MCI patients. Methods: Machine-learning models were computed to test the classificatory accuracy of cognitive, volumetric [structural magnetic resonance imaging (sMRI)] and blood oxygen level dependent-connectivity (extracted from RS-fMRI) features, in single-modality and mixed classifiers. Results: The best and most significant classifier was the RS-fMRI+Cognitive mixed classifier (94% accuracy), whereas the worst performing was the sMRI classifier (∼80%). The mixed global (sMRI+RS-fMRI+Cognitive) had a slightly lower accuracy (∼90%), although not statistically different from the mixed RS-fMRI+Cognitive classifier. The most important cognitive features were indices of declarative memory and semantic processing. The crucial volumetric feature was the hippocampus. The RS-fMRI features selected by the algorithms were heavily based on the connectivity of mediotemporal, left temporal, and other neocortical regions. Conclusion: Feature selection was profoundly driven by statistical independence. Some features showed no between-group differences, or showed a trend in either direction. This indicates that clinically relevant brain alterations typical of MCI might be subtle and not inferable from group analysis.


Introduction
Mild Cognitive Impairment (MCI) identifies adults who experience impairment in neuropsychological abilities, while retaining daily-life independence.The range of possible aetiologies is heterogeneous, with Alzheimer's disease (AD) often being a prime suspect. 1 Non-pathological processes of senescence, however, may also trigger a measurable decline in cognitive functioning, 2 and it is not uncommon that healthy adults complain of their declining cognitive abilities.This conceptual overlap is further complicated by additional factors.
First, thresholds of impaired cognitive performance have been operationalized in many ways. 3Second, variability in the choice of cognitive tests and their procedure of administration generates different diagnostic outputs. 4Third, cross-cultural differences exist in test performance, 5 but this is rarely acknowledged.Fourth, raw neuropsychological scores may distribute skewly, 6 compromising the validity of the descriptors used to set the threshold of "normality".Fifth, high levels of education may mask the presence of cognitive impairment. 7cently, revised versions of consensus guidelines have incorporated supporting evidence from neuromolecular imaging and cerebrospinal-fluid biomarkers, for diagnosing MCI due to AD. 8 Despite the theoretical robustness of this approach, these techniques are not appropriate for characterising AD burden in asymptomatic adults or patients with non-progressive/nonpersistent MCI. 9 A more viable contribution is that of structural (sMRI) and resting-state functional MRI (RS-fMRI).Both appear useful to describe patients diagnosed with clinically-established AD, 10,11 and RS-fMRI in particular is increasingly receiving attention by researchers, as it seems to be sensitive to very early pathological alterations. 12Although significant reduction of regional functional connectivity in MCI has been reported in crosssectional, 13 and longitudinal studies, 14 this evidence is the result of group-level inferential statistics, which is of limited utility for the clinical classification of single individuals.
Multivariate and machine learning techniques offer the opportunity to build data-driven classificatory models which can predict group membership of each participant based on MRI features.[21] In this study we used machine-learning methods to carry out classifications of participants with a diagnosis of MCI based on features extracted from cognitive performances, sMRI, and RS-fMRI, with a series of single-type and mixed classifiers.No specific hypothesis was formulated in association with cognitive classifiers as the diagnostic status was heavily dependent on cognitive performance.We hypothesized that RS-fMRI-based classifiers would be superior to the others (quantitative expectation), and that the selected features would yield important connection with neuropathological models of abnormal aging (qualitative expectation).A major goal was to understand to what extent and in what way such methodology would be of aid in clinical settings.

Participants
One-hundred-and-thirty-nine inhabitants of the Venetian lagoon, older than 50 years and still independent in their daily activities were considered for inclusion.Candidates were either out-patients referred to neurological examination by their general practitioner because of suspected cognitive decline, or adults willing to take part in research projects because of personal interest and/or subjective cognitive concerns.All underwent a comprehensive medical examination led by an experienced neurologist between May 2011 and November 2014.This was based on the anamnestic information, a neurological screening, a clinical MRI protocol (including diffusion-weighted, T1-weighted, T2-weighted, and FLAIR images) which was inspected by a senior neuroradiologist, and a battery of cognitive tests administered and interpreted by an experienced neuropsychologist.Upon application of exclusion criteria, participants were allocated to one of two diagnostic categories: healthy adult having no objective cognitive difficulties ("control"), or patient diagnosed with MCI ("patient").Diagnoses of MCI were established by a consensus of opinions among clinicians and clinical follow-ups.Diagnostic exclusion criteria were as follows: a MMSE score < 24, ongoing treatments (psychotropic medication, cholinesterase inhibitors, memantine, drugs for research purposes, or with toxic effects to internal organs); a significant disease at clinical level; history of TIA; diagnosis of severe vascular pathology; baseline structural MRI revealing different diagnostic patterns from those expected in MCI; presence/diagnosis of uncontrolled seizures; peptic ulcer; cardiovascular disease; neuropathy with conduction difficulties; significant disabilities; proof of abnormal baseline levels of folates, vitamin B12 or thyroid stimulating hormone."Technical" exclusion criteria were instead as follows: more than one missing entry in the database of cognitive scores; presence of relevant signal artefacts or excessive in-scanner motion.Based on the application of these criteria, 50 controls and 50 patients matched as closely as possible at a group level for age, education levels and gender ratio were included.Demographic characteristics of the final sample are reported in Table 1.This study was approved by the Institutional Review Board of the IRCCS Fondazione Ospedale San Camillo (Venice, Italy), protocol number 11/09 -version 2.
Informed consent was obtained from all participants.
-Add Table 1 about here -

MRI and cognitive data acquisition
The MRI protocol (1.5 T Philips Achieva), including structural and functional acquisitions, was completed in a single session.Participants were instructed to keep their eyes closed without falling asleep and remain as still as possible for the full duration of the examination.
A neuropsychological battery was designed for clinical purposes, with particular focus on those domains which are most sensitive to aging and early-stage neurodegeneration.
-Add Figure 1 about here -MRI data preprocessing T1-weighted images were processed with the FreeSurfer Image Analysis Suite (http://surfer.nmr.mgh.harvard.edu/)following standard segmentation and parcellation procedures.Morphological indices were extracted from cortical and subcortical structures.RS-fMRI images were preprocessed using the Statistical Parametric Mapping 8 (Wellcome Trust Centre for Neuroimaging, London, UK) CONN toolbox, 22 in a Matlab R2012a environment (Mathworks Inc., UK).Images were realigned to estimate head-motion vectors, slice-timed to correct for intra-volume temporal phasing-out, co-registered with their T1weighted image, normalized with the EPI template, smoothed with a 6 mm full-width at half-maximum gaussian filter to minimize noise and residual anatomical discrepancies, partialized of the confounding signal coming from the top 5 orthogonal components estimated from the maps of white matter and cerebrospinal fluid (aCompCor procedure), 23 and band-pass filtered (0.008-0.09Hz).

Feature definition
A large number of candidate indices were defined from demographic/clinical variables and neurostructural/neurofunctional maps (Figure 1).Basic demographic information and raw cognitive scores (extracted from clinical neuropsychological tests) were included in this list.
Neuroanatomical volumetric indices were extracted from the segmentation and parcellation output.ROI-to-ROI (R2R) indices of functional connectivity were computed from RS-fMRI runs as part of the CONN pipelines.These ROIs were defined based on the anatomically automatic labelled (AAL) atlas. 24R2R indices identified aspects of connectivity among pairs of AAL ROIs.To minimize potential selection bias, and in parallel optimize number of regions, the cerebellum was excluded from the model, since it is characterized by low presence of AD pathology, 25 and is usually considered a reference region in PET-based studies.Primary sensorimotor areas were also excluded due to their prolonged preservation in AD. 26 Orbitofrontal and temporopolar regions subjected to signal dropout were excluded too to avoid miscalculations.In total, 2122 indices were extracted: demographics: 3, cognition: 19, sMRI: 84, RS-fMRI: 2016.

Feature selection
Two machine-learning algorithms were considered.These were the linear and quadratic Fisher's discriminant analyses (LDA and QDA, respectively), 27 based on their proneness to being applicable to multiple research contexts, including small-sample scenarios. 28,29Both classifications were modelled for each set of features.In order to pursue maximized classificatory accuracy, the classifier with higher accuracy was chosen each time.A featureselection analysis was then run by testing the performance of the chosen classifier as a function of groups of indices.This was achieved via a cost function. 27The complete dataset was subdivided into training and testing subsets using a 10-fold Montecarlo cross-validation.
The performance of each classifier was finally evaluated by computing accuracy, area under the receiver-operating-characteristic curve, and sensitivity.Within the structural classifier, a significant difference was found between groups solely for the volume of the right hippocampus, and the volumes of the two hippocampi and caudate nuclei were highly correlated (p = 4.90e-30 and p = 4.20e-52, respectively).
Since most of the RS-fMRI R2R indices featured the connectivity of mediotemporal areas, their between-group directionality was explored.The association tended to be larger in controls for some of the features (e.g., the second feature: right parahippocampal gyrusleft putamen R2R connectivity), and larger in patients for others (e.g., the third feature: left hippocampusright superior temporal gyrus R2R connectivity).
In the mixed sMRI+RS-fMRI classifier the first two R2R features were explored further.A 0.2 z difference was seen for the first, temporo-parietal feature in patients.On the other hand, the correlation between the posterior cingulate cortex and the left pars opercularis was close to zero in both groups, but showed a larger dispersion in the patient group.
Finally, Pearson's correlations were run to explore the association among the top features, within each classifier.Variable results were found, with cognitive and sMRI features showing significant correlations, and RS-fMRI indices tending instead to be statistically independent from each other.(Figure 2).
-Add Figure 4 about here -

Discussion
AD triggers a large number of alterations to brain structure, brain connectivity and cognitive function.Partly, this is the result of a global process of decline which, homogeneously, affects a large number of regions, circuital pathways, and cognitive domains (i.e., global atrophy and ventricular enlargement, global loss of network connectivity and regional isolation, global cognitive decline).What looks like a general trend, however, can be broken down into separate processes.In AD, studies have highlighted that disease progression involves a number of separate routes.For instance, loss of posteromedial metabolism and atrophy in the mediotemporal complex seem to be driven by distinct mechanisms. 302] The extrapolation of independent disease mechanisms can be helpful in clinical settings.For example, there are studies which highlight the importance of exploring mechanisms of both declarative and semantic memory for an early diagnosis of AD, as semantic processing is severely down-regulated in AD, but not significantly disrupted by the normal processes of aging. 33On this note, the use of machine-learning algorithms for classification purposes is an excellent approach to clarify the diagnostic importance of features extracted from structural and functional neuroimaging.As commented below, however, the particularity of this approach lies in the elimination of any redundancy expressed by features significantly correlated with one another.The resulting combination of variables, therefore, captures distinct aspects of classification, and, thus, of disease.

RS-fMRI improves classification
A look at the quantitative aspects of classificatory performance reveals that the sMRI classifier was the least accurate.This indicates that morphometric biomarkers are not as effective as fMRI or cognitive features at detecting abnormalities in the presence of MCI.
We argue that, since hippocampal and brain volumes are in fact also influenced by nonpathological aging, 34 they are unsuitable to provide classificatory specificity.
Classifiers based on cognitive features performed very well.This is necessarily due to the fact that the standard of truth (i.e., "patient" or "control") was heavily based on the presence of cognitive impairment measured with cognitive tests.
The most accurate classifications were obtained when RS-fMRI features were included in the feature-selection process.The performance of the RS-fMRI classifier did in fact not differ from that of the Cognitive classifier.In addition, RS-fMRI features improved classification of both sMRI and cognitive features.One possible reason behind such good performance may be the large number (2016) of available RS-fMRI features.This should be seen as an advantage enabled by RS-fMRI modalities (rather than a methodological imbalance), as RS-fMRI offers the opportunity of exploring properties of the BOLD signal which are not absolute (i.e., related only to a specific voxel or ROI), but relative (i.e., reflective of the relationship between two voxels or ROIs).These dynamic characteristics are profoundly associated with the basic processes of brain functioning, as task performance is supported by the interactive co-activation/co-deactivation of multiple structures.

Each classifier as informant of distinct mechanisms
A closer, qualitative look at each classifier allows the clarification of: 1) how useful machinelearning algorithms are to extract classificatory information, and 2) how this method helps the understanding of the various types of mechanisms which may separate patients from controls.
As for the Cognitive classifier, the first feature was a measure of declarative memory (the delayed recall of the Rey-Osterrieth Figure ), a domain well known to be severely affected in AD.Although cognitive assessment featured a second measure of long-term declarative memory (the delayed recall of the Prose Memory Test), this variable was not chosen as part of the classifier.We argue that the significant correlation found between the two memory tests translates into comparable classificatory accuracies, hence the non-necessity of including both.On the other hand, the performance on a measure of semantic processing (the Category Fluency Test) accounted for an exclusive and relevant amount of variability.
Declining semantic processing is one of the major features of various forms of neurodegeneration, and occurs as a result of compromised circuits sustained by regions that are anatomically distinct from those in support of declarative memory. 35By relying on the same argument, we speculate that the global classifier did not include both the performance on the delayed recall of the Rey-Osterrieth Figure and the volume of the right hippocampus (the "top" cognitive and MRI-based features, respectively) because of a conceptual association between the two variables. 36e sMRI classifier was heavily reliant on the right hippocampus in our sample, while the left hippocampus, presumably because of a very high inter-hemispheric correlation coefficient, was not included.The second volumetric feature was the left caudate nucleus (presumably contributed by both caudate nuclei, given the large intra-hemispheric correlation).While the volume of the right hippocampus was significantly smaller in the group of patient, no significant between-group difference emerged for the left caudate.It is interesting to note how features with no between-group differences may yield classificatory relevance.We argue that there might be structures subjected to minor morphometric changes, which, however, are more distinctively related to cognitive impairment than any more extensive morphometric dysregulation located elsewhere.On this note, studies on human and primate brains show that neuronal and synaptic densities are not homogeneous across the entire cortex. 37,38Small group differences in a region with high cell density or sustaining a crucial function might have profound biological implications.Dopaminergic neurons represent an example of this mechanism in Parkinson's disease, as they are a minimal portion of the total number of nervous cells, but they serve paramount purposes.In this respect evidence does shows that the caudate manifests volumetric shrinkage in AD, 39 while this does not occur in healthy aging. 34These findings show that the caudate alterations seen in patients, albeit not reaching statistical significance in any specific direction, seem to be independent from mediotemporal modifications, yet conceptually relevant for the diagnosis of MCI.
The RS-fMRI classifier was profoundly based on the connectivity of the left and right hippocampal formation.The first feature represented the R2R pathway accounting for the single largest portion of variability in our sample.The subsequent four features all entailed independent aspects of mediotemporal connectivity.Since the earliest histo-pathological descriptions, AD has been described as a disease that causes a computational isolation of the hippocampus. 40Loss of hippocampal and parahippocampal connectivity would be the in vivo equivalent of this process.Additionally, one of the R2R features showed a trend towards the opposite direction, with patients having increased hippocampal-temporal connectivity.In line with the evidence of increased hippocampal metabolism shown during the MCI stage, 32 we hypothesize that up-regulated connectivity in patients may be the result of neuroplastic modifications triggered by the early stages of hippocampal disconnection, and that the RS-fMRI classifier is suitable to capture disease mechanisms as well as neuroplastic responses.These latter would in all likelihood not be recordable by morphometric acquisitions, which reflect instead gross anatomy, well-known to be more resistant to neuroplastic alterations.
We then included a mixed sMRI+ RS-fMRI classifier to understand whether the sole information extracted from an MRI protocol could be exploited clinically.Hippocampal volumes were confirmed as the most informative feature.Decreased connectivity (a 0.2 average drop in the correlation coefficient) between temporal and parietal region improved this classification.Interestingly, for the third feature (posterior cingulate to Broca's area), the r coefficient was close to zero in both groups (indicating no association).In this case, the two groups differed in the dispersion levels, suggesting that the informative aspect for this pathway might be the presence of an association (regardless of the directionality) in a pathway where an association would normally not exist.
The mixed sMRI+ Cognitive classifier was constructed based on the combination of features that are usually at disposal of the clinician (a cognitive assessment and an anatomical brain scan).The results are perfectly in line with the typical pattern of clinical features that drives a diagnosis of early-stage neurodegeneration, as the selected features are measures of declarative memory and mediotemporal volumes.
The RS-fMRI+ Cognitive classifier was the top-performing one.When the analysis of declarative memory is flanked by measures of connectivity, the classification approaches optimal levels (accuracy ≈ 94%) and outperforms the support provided instead by sMRI.The superior performance of this classifier might reflect the qualitatively different disruption caused by AD neurodegeneration on brain function, leading often to compensatory change in controls and maladaptative alteration in the early stage of neurodegeneration. 31inally, a close look at the global classifier indicates that the characterization of cognitive profiles (presence of declarative-memory and semantic-processing deficits) was by far the most accurate predictive formula for classifying patients.R2R features contributed to improving the accuracy by highlighting the role played by various aspects of the limbic system, and temporo-occipital areas.

Limitations
Despite the protection towards bias offered by a data-driven approach and a sample of comparable or larger size than that of other studies, [16][17][18][19][20] the outcome is still the result of feature and algorithm definition.Although we selected "standard" cognitive tests and segmentation/parcellation atlases, and two basic machine-learning algorithms, we cannot rule out the possibility that other methodological choices might have yielded slightly different patterns of findings.This, however, would not undermine the core findings and interpretations.Moreover, the sets of cognitive, neuroanatomical and neurofunctional variables are qualitatively different from one another, e.g., in their number, in the presence of a numerical ceiling, or in their directionality (as patients may show either decreased or increased RS-fMRI connectivity, but only an impoverishment of cognition and brain structure, see Table 2 for the most distinctive anatomical and R2R).Inevitably, feature selection will be affected by these different properties.As a consequence, comparisons of classifiers will be meaningful as far as quantitative performance is concerned, but any analysis focusing on confronting different types of features has to be interpreted with caution.
Post hoc inter-feature correlations are in line with the presence of such qualitative differences, as, for instance, most cognitive features (fewer in number) were mutually correlated, determining a certain degree of collinearity, while RS-fMRI indices (many in number) were unrelated with one another.
-Add Table 2 about here -Clinical usefulness of machine-learning methods In conclusion, these findings indicate that RS-fMRI R2R connectivity improves diagnostic classification of patients with MCI, and outperforms the accuracy of sMRI, which was profoundly reliant on the importance of hippocampal volumes.A careful look at each classifier revealed that machine-learning approaches, by circumventing feature-to-feature statistical redundancy, generate classifiers in which each feature accounts for an independent portion of classificatory accuracy, presumably in reflection of separate disease mechanisms.
These might manifest as decrease/increase in R2R correlation (and these differences are often very small and not significant), or in the presence of a correlation between two otherwise uncorrelated areas.Additionally, between-group volumetric differences do not seem to scale to a common denominator, as minimal differences in specific structures might be more relevant than larger differences elsewhere.These alterations might represent an important source of clinical information and have to be further explored in order to be implemented in neurological settings.The nature of these findings suggest that clinically-relevant alterations seen in brain function of MCI patients might be quite subtle and not potentially inferable from group-based analyses.Selected post-hoc analyses.Between-group comparisons (t test statistics) were run to explore the group-level differences of the main features included in the classifiers.MCI patients had significantly larger volumes in the right hippocampus but no difference in the left caudate nucleus.Moreover, patterns of connectivity showed a trend in either direction and, as exemplified by the association between the left posterior cingulate and the left pars opercularis, did only differ in the pattern of dispersion Figure 1 27 Seven classifiers were tested: three basic "single-modality" (a-Cognitive; b-sMRI; c-RS-fMRI) and four "multiple-modality" classifiers (d-sMRI+ RS-fMRI; e-sMRI+ Cognitive; f-RS-fMRI+ Cognitive; g-sMRI+ RS-fMRI+ Cognitive).Demographic features were included in all classificatory models.Bonferroni-corrected, post-hoc Kruskal-Wallis statistics tested interclassifier differences in accuracy. 29esultsThe Cognitive classifier (Figure2a; LDA) was driven by a test of declarative memory (Rey-Osterrieth Figure -Delayed Recall), and a measure of semantic processing (Category Fluency test).These two were responsible for a classificatory accuracy of about 83%.Further tests improved the accuracy rate by an additional 5%.The first volumetric feature selected by the sMRI classifier (Figure2b; LDA) was the right hippocampus, followed by the left caudate and the left orbital gyrus.These three features approached a 77% accuracy, reaching 80% with additional indices.The RS-fMRI classifier (Figure2c; QDA) overstepped an 85% accuracy plateau after 5 indices.These were patterns of R2R connectivity widespread across various regions of the brain, but heavily hinging upon mediotemporal regions (3 out of 5 indices).The mixed sMRI+ RS-fMRI classifier (Figure2d; QDA) obtained performance levels equal to 85% accuracy after 5 indices.The volume of the right hippocampus was selected as the most accurate, followed by R2R connectivity of various associative (prefrontal, parietal and temporal) cortices.As with the cognitive classifier, the remaining three mixed classifiers were reliant on declarative memory and semantic processing as the two leading features.In the sMRI+ Cognitive classifier (Figure2e; LDA) these two indices reached an 83% accuracy, marginally improved by volumetric properties of the left mediotemporal complex.In the RS-fMRI+ Cognitive classifier (Figure 2f, QDA),and in the global sMRI+ RS-fMRI+ Cognitive classifier (Figure 2g, QDA) the accuracy of the two tests reached an accuracy of over 85%, further improved by additional R2R indices.In the global classifier the accuracy was raised to 90% with the addition of indices characterising left temporal connectivity.Conversely, in the RS-fMRI+ Cognitive classifier the accuracy was further enhanced up to 94% with the contribution of two indices of widespread connectivity.-Add Figure 2 about here -The comparison between classifiers revealed that the RS-fMRI+ Cognitive classifier was, by far, the most accurate ensemble, accounting for a significantly more accurate classification than five of the other classifiers.Vice versa, the sMRI classifier was the least accurate, performing significantly worse than any other classifier (Figure 3).For each classifier, the performance of the less accurate classification methods (LDA or QDA) was associated with 2-3% less accuracy than the rates of those described above.Nonetheless, these were reliant on comparable sets of features (the recall of Rey-Osterrieth Complex Figure and Category Fluency, the volume of the right hippocampus, and the connectivity of mediotemporal regions).-Add Figure 3 about here -Selected post hoc analyses were run to understand the clinical importance of these features driven by and in support of possible interpretational frameworks (Figure 4).Within the cognitive classifier a significant difference existed between the two diagnostic groups on the delayed recall scores on the Rey-Osterrieth Figure and on the Category Fluency Test (both p values < 0.001).No significant difference was present, however, between the two groups on the two subsequent tests (Digit Spanforward and the Similarities subtest of the WAIS).Moreover, a significant correlation was found between the delayed recall scores on the Rey-Osterrieth Figure and the delayed recall scores on the Prose Memory Test (partial correlation correcting for age and education levels, p = 0.000006), and between the delayed recall scores on the Rey-Osterrieth Figure and the volume of the right hippocampus (p = 0.00013).

Figure 1 List
Figure 1

Figure 2 The
Figure 2

Table 1 .
Demographic and neuropsychological characteristics of the sample -group differences in cognitive performance were analyzed both with Mann-Whitney tests as well as ANOVAs, correcting for age and years of education.A Bonferroni-corrected p threshold equal to 0.002 was adopted as the appropriate significance level.There were only three missing data-points: two participants missing their Token Test score (one control and one patient) and one participant (patient) missing their Paired Associates Test score. Between

Table 2 .
Distinctive neuroanatomical and neurofunctional characteristics of the two diagnostic groups -group differences in sMRI and RS-fMRI indices were analyzed with ANOVAs, correcting for age.Since Bonferroni correction was judged too conservative for such a large number of statistical comparisons (n = 2100), a still relatively strict p value equal to 0.005 was used.Of the entire set of indices, only 4 sMRI and 27 RS-fMRI indices survived this threshold and were reported in the table. Between