Tensor electrical impedance myography identifies clinically relevant features in amyotrophic lateral sclerosis

Objective. Electrical impedance myography (EIM) shows promise as an effective biomarker in amyotrophic lateral sclerosis (ALS). EIM applies multiple input frequencies to characterise muscle properties, often via multiple electrode configurations. Herein, we assess if non-negative tensor factorisation (NTF) can provide a framework for identifying clinically relevant features within a high dimensional EIM dataset. Approach. EIM data were recorded from the tongue of healthy and ALS diseased individuals. Resistivity and reactivity measurements were made for 14 frequencies, in three electrode configurations. This gives 84 (2 × 14 × 3) distinct data points per participant. NTF was applied to the dataset for dimensionality reduction, termed tensor EIM. Significance tests, symptom correlation and classification approaches were explored to compare NTF to using all raw data and feature selection. Main Results. Tensor EIM provides highly significant differentiation between healthy and ALS patients (p < 0.001, AUROC = 0.78). Similarly tensor EIM differentiates between mild and severe disease states (p < 0.001, AUROC = 0.75) and significantly correlates with symptoms (ρ = 0.7, p < 0.001). A trend of centre frequency shifting to the right was identified in diseased spectra, which is in line with the electrical changes expected following muscle atrophy. Significance. Tensor EIM provides clinically relevant metrics for identifying ALS-related muscle disease. This procedure has the advantage of using the whole spectral dataset, with reduced risk of overfitting. The process identifies spectral shapes specific to disease allowing for a deeper clinical interpretation.


Introduction
A number of studies have demonstrated the potential of electrical impedance myography (EIM) as a biomarker for neuromuscular disease. A range of outputs have been reported for identifying disease-related change. These include phase measurements at single frequencies (Rutkove et al 2007, Mcilduff et al 2017, phase ratios (Schwartz et al 2015, Li et al 2016 and multiple frequency phase values incorporated through machine learning algorithms including support vector machine (Srivastava et al 2012) and wrapper feature selection methods (Alix et al 2020, Schooling et al 2020). Recent developments have included spectral recordings over a range of input frequencies and increasingly complex electrode arrays capable of interrogating tissue with multiple electrode configurations (Alix et al 2020, Schooling et al 2020, Luo et al 2020. The resulting data can achieve a more detailed assessment of muscle disease but are complex and high dimensional. Having a simple quantitative output is key for progression in the development of EIM as an effective biomarker (Sanchez et al 2020).
Amyotrophic lateral sclerosis (ALS), which causes progressive muscle weakness, is one example of a neuromuscular condition in need of an effective, simple to use bedside biomarker of disease. This is particularly true for the muscles of speech and swallowing (the so-called bulbar muscles) (Rutkove 2015, Benatar et al 2016. Bulbar involvement is a relatively specific feature of ALS (Jenkins et al 2016) and is associated with a poor prognosis. The ALSFRS-R (ALS Functional Rating Scale-revised) bulbar subscore and measurements of tongue strength are the main tools used for monitoring bulbar progression. However, they lack sensitivity in slower disease progression as well as in the earlier and later stages of disease (Rutkove 2015). Furthermore the ALSFRS-R is subjective (Rutkove 2015), while strength testing is effort dependent and does not evaluate bulbar weakness effectively (Rutkove 2015). Hence, the development of a new objective and sensitive biomarker would be of great benefit.
The tongue is an extraordinarily complex muscle, with fibres running through multiple planes Napadow 2005, Gaige et al 2007). Thus, assessment of muscle impedance in a number of different directions using multiple frequencies should encapsulate maximal information. However, a potential problem lies in interpreting this large amount of data to give an objective measure of disease. Thus, decomposition of the data is required in order to draw out the most important features and facilitate interpretation/graphical representation. Our previous work (Alix et al , Schooling et al 2020 has used wrapper methods for feature selection in order to determine the specific frequencies to select for use in classification analysis. However, this has the disadvantage of being computationally expensive (Chandrashekar and Sahin 2014), particularly if we want to consider all dimensions in the dataset.
This study explores the use of an alternative analysis technique, non-negative tensor factorisation (NTF), on the same EIM tongue dataset. This approach involves computing a parts-based representation of the data, resulting in low-rank approximations. NTF is effective at exposing the latent structures in high dimensional data and can extract the essential features (Chandrashekar and Sahin 2014). This study is the first to present the application of NTF in EIM data, however promise has previously been shown in its application to several biomedical signal studies including EEG (Escudero et al 2015), mass spectrometry imaging (Aram et al 2015) and surface EMG (Xie and Song 2013, Ebied et al 2017, Akmal et al 2019. The advantage of considering the data as contributions of different spectral shapes is the increased insight into the clinical interpretation of the spectral contributions between the different disease states. Here, the non-negativity constraint is key for providing physically interpretable outputs. Alternative dimensionality reduction techniques such as principal component analysis, the L2 Euclidean norm and feature selection do not have the same physical interpretability. A further advantage to NTF is the capability of dealing with missing data for example, it has been demonstrated that a Parallel Factor Analysis (PARAFAC) model can be correctly determined even when up to 70% of the data are missing (Tomasi and Bro 2005).
In this study our aims were to assess if dimension reduction through NTF could deliver features to discriminate between patients and healthy participants and to explore their correlation to patient symptoms. Our hypothesis was that NTF will provide successful classification, and provide a comprehensive analysis of how the spectral pattern varies with disease severity, allowing a clinical interpretation. Results were compared to the performance of single frequency raw features and the L2 norm and the limitations of each method are discussed. For simplicity we will refer to this process of dimensionality reduction through NTF as tensor EIM. 2. Methods 2.1. Patient symptom score Tongue strength (TS) recordings were undertaken in each participant using quantitative muscle testing system (Averil Medical), coupled to the Iowa Oral Performance Instrument , Easterling et al 2013. Additionally, all patients had their ALSFRS-R bulbar subscore taken, as well as a clinical score (CS) based on a senior clinician's observation of tongue wasting, weakness and fasciculations (see figure 1).
In order to provide a comprehensive assessment of the overall severity of each patient's symptoms, all three measures of disease (TS, ALSFRS-R and CS) were combined to create an overall symptom score. This was calculated by normalising both the tongue strength (TS → TS N ) and clinical score (CS → CS N ) to give a value distributed between 0 and 12, with a lower value representing more severe disease.
The overall symptom score was then calculated as the sum (ALSFRS-R + TS N + CS N ), which is distributed between 0 and 36. Through the analysis patients are split into the categories of mild and severe symptom, where an overall symptom score 22 determines mild symptoms (20 patients) and <22 determines severe symptoms (13 patients).

Raw data description
This paper details further analysis of a tongue impedance dataset previously described in Alix et al (2020), Schooling et al (2020). In the present work, we utilised data obtained from placement of the novel tongue EIM probe in the midline of the tongue ('central placement') and restrict our analyses to 3D electrode configurations (see figure 2(a)), since they achieve the highest classification performance (Alix et al 2020, Schooling et al 2020). The present data are thus obtained from 33 patients with ALS (mean age 61, 58% male) and 28 healthy volunteers (mean age 53, 50% male).
Each impedance measurement (Z) was taken for 14 frequencies ( f ) starting at 76 Hz and doubling each step up to a maximum of 625 kHz. This frequency range has been utilised in other mucosal surface based applications of impedance spectroscopy (Jokhi et al 2009, Hillary et al 2020, Anumba et al 2021. Impedance is a complex quantity and is separated into real and imaginary parts as Z( f ) = R( f ) + jX( f ), with R( f ) being the resistance and X( f ) being the reactance. The impedance values are calibrated to return the volume independent impedivity (such named volume conduction properties (VCPs) in Sanchez et al (2020), ). The resistivity (ρ) and reactivity (χ) are calculated as: Figure 2. (a) Current injection and voltage read-out electrodes used for each 3D electrode configuration. (b) Median impedance spectra for the healthy cohort, mild ALS group and severe ALS group. It is clear that on average there is little discrepancy between the healthy and mild spectral patterns, while the severe spectra demonstrates a distinct shift to the right, with an increase in magnitude at the low-frequency end and a decrease in phase at the high-frequency end.

NTF procedure
NTF is a process of approximating an N-order tensor as sum of R rank-1 tensors formed by the outer products of N vectors. Including our chosen recording paradigms, each participant dataset contains 84 data points (14 resistivities and 14 reactivities, for the 3 electrode configurations).

Rank selection
The rank choice was determined by comparing the error on the reconstructed data with the variability between repeat measurements. The frequency dependent 95th percentile in resistance and reactance figure A1) was used as the acceptable variation at each frequency when assessing the acceptability of the reconstructed data.
For each reconstructed spectra the root mean square deviation (RMSD) between the original data and its reconstruction was calculated, normalised by the 95th percentile variation: define the reconstructed dataset. Acceptable variation for the reconstructed spectra is defined by RMSD 1. The probability that a given reconstructed spectra will fall in the acceptable range for a given rank R (γ(R)) is characterised by the proportion of spectra to be acceptably reconstructed. The rank was selected as the lowest value satisfying γ(R) = 1, i.e. 100% of spectra reconstructed inside the acceptable range.

Data transformation
Before applying the tensor decomposition the original dataset was transformed. This must be done to meet the non-negativity requirement of NTF. Additionally, we normalise the data to give an equal contribution from each frequency.
The reactivity dataset, {χ( f )} is mostly negative with a small number of positive values in the lower frequency range. To transform the reactivity dataset all data were shifted by a set value at each frequency so that the maximum reactivity value becomes zero, and then divided by the median value at that frequency. This returns a normalised non-negative output. The resistivity dataset, {ρ( f )} is by definition already non-negative, however for consistency the same transformation was applied here. All data were shifted so that the minimum resistivity value becomes zero, and subsequently divided by the median value at that frequency The minimum, maximum and median values are calculated over the entire dataset at each frequency. Following the tensor factorisation the inverse transformations are applied to the output modes. This returns the complex impedivity, comprising positive resistivity but negative reactivity when inverse transformation adds a negative subtracted value.

Optimisation algorithm
The tensor factorisations are approximated using CANDECOMP/PARAFAC (Carroll andChang 1970, Harshman 1970) (or CP) decomposition via alternating least squares (Kolda and Bader 2009). Let us define tensor  , where the three dimensions describe the 28 features of each measurement (I = 28), the 3 configurations (J = 3) and the 33 + 28 participants (K = 61). The decomposition is: The factor matrices refer to the combination of vectors from rank-one components, i.e. A = [a 1 a 2 L a r ] and likewise for B and C. Alternatively, (6) can be written using the notation for the Kruskal Operator (Kruskal 1977, Kolda 2006: The goal is to compute a CP decomposition with R components that best approximates our original input tensor. This is done by minimising the reconstruction error,  : The alternating least squares procedure begins by initialising all the factor matrices with random values. Considering the third-order example the approach fixes B and C to solve for A, then fixes A and C to solve for B, then fixes A and B to solve for C. This procedure was repeated until the convergence criterion was met, where the absolute difference between the current error and previous error is less than 10 −10 .

Significance testing, correlation analysis and classification
The data are not parametrically distributed, therefore the Mann-Whiteney U test was applied for significance testing. False discovery rate (FDR) correction was applied when multiple features are compared using the Benjamini-Hochberg procedure. The assessment of correlation was undertaken using the Spearman rank-order correlation coefficient (ρ), which is a nonparametric measure of the monotonicity of the linear relationship between two datasets. The p-value indicates an estimated value for the probability of an uncorrelated system returning datasets with a correlation at least as extreme as the one determined. Significance testing and correlation assessment was undertaken using python with SciPy statistics (Bressert 2012).
Classification assessment was made for three different classifiers; k-nearest neighbours (KNN) (Cunningham and Delany 2020) with k set to 3; Support Vector Machine (SVM) (Suthaharan 2016) with radial basis function kernel; and Random Forest (RF) (Breiman 2001) with 10 trees. To minimise overfitting 4-fold cross-validation was applied. When applying feature selection a forward selection wrapper method was used

Results
Median spectra from patients and healthy volunteers demonstrate clear differences in the overall spectral shapes (see figure 2(b)). Tensor EIM was applied to quantify this difference in spectral shape between participants. Following the rank selection procedure a rank of 3 was selected, this gives a compression ratio of 19:1. The three spectral modes output from this decomposition are presented in figure 3(a).

Patient versus healthy assessment
The contributions in the patient cohort and healthy cohort are assessed for the three spectral modes (c 1 , c 2 , c 3 ) as well as for a combined metric, termed tensor EIM, which is defined by 0.2 × c 1 − 0.9 × c 2 + 0.1 × c 3 . Boxplots for these distributions are shown in figure 3(b).
It is clear that the first and third mode generally contribute most significantly in the patient cohort, while the second mode contributes most in the healthy cohort. This agrees with the general trend identified when the raw data median spectral patterns are assessed (see figure 2(b)). The significance in the healthy/patient split for each metric was calculated using the Mann-Whitney U test, and the corresponding p-values are reported in table 1. These are compared to the raw data L2-norm and the best performing single frequency across all frequencies and electrode configurations. The tensor EIM metric comprises all the spectral information and performs better than the previously used L2 norm. A single frequency separates the patient and healthy groups with similar high significance, however NTF outputs increase sensitivity when comparing healthy with mild patients.
The patient diagnosis classification performance is summarised in table 2, which incorporates results from three classifiers (KNN, SVM and RF). This was repeated for all tensor EIM features, all raw data features and with wrapper feature selection applied to the raw data features. Performance is optimal when wrapper feature selection was applied. Yet when all features of the dataset were included, the tensor EIM process demonstrated an improvement (best with KNN, AUROC = 0.78).
EIM recordings from limb muscles generally began at approximately 10 kHz due to electrode artefacts relating to keratinised skin (Martinsen et al 1997). Mucosal surface recordings, like those from the tongue, are reported at lower frequencies, in ranges similar to those presented here (Lackovic and Stare 2007, Murdoch et al 2014. In order to assess the robustness of the low frequency EIM data, these analyses were repeated after removing the bottom two frequencies. These results are presented in appendix, figure A2. Comparing the NTF spectral modes to the full frequency range modes in figure 3(a) it is clear that the shapes of the spectra are completely maintained. The significance of splitting of the parameters between healthy and disease is also comparable with tensor EIM consistently giving highly significant p-values. The patient classification performance are also very similar (AUROC = 0.77 truncated, 0.78 full range). Overall, this consistency in the pattern of spectral modes and general performance indicates that the lower power spectra are neither dominant nor unreliable.

Patient severity assessment
EIM can be used to help predict the severity of a patient's disease. This was assessed by looking at the ability of different approaches to data analysis to differentiate between the severe and mild patient cohort, as well as in the correlation performance with the overall symptom score. Figure 4(a) demonstrates a highly significant correlation of the tensor EIM metric with patient symptoms. This is the best performance when compared to both the individual NTF factors and the raw data metrics ( figure 4(b)). Correlation performance with the ALSFRS-R bulbar subscore is shown in appendix, table A2 and was similar. Further, the Mann-Whitney U test significance values for the separation between severe and mild patients are highly significant for tensor EIM and individual factor c 2 . These outperform both the raw data L2 norm and the best single frequency.
An analogous classification assessment to that presented in section 3.1 was made for the classification between mild and severe patients (table 3). Here, the best performance occurs for tensor EIM features (RF, AUROC = 0.75).
Again this analysis is repeated for the truncated frequency range as shown in appendix, figure A3. The results show comparable performance in terms of correlation with symptoms (ρ = −0.66 truncated, −0.70 full range) as well as classification power between mild and severe patients (AUROC = 0.73 truncated, 0.75 full range).

Clinical interpretation
Two spectral modes can be identified from the tensor EIM metric, which can be thought of the ALS patient contribution and the healthy volunteer contribution. These are derived from the combination of spectral modes which contribute positively in the calculation of the tensor EIM metric (red spectra in figure 5(a)), and the modes which contribute negatively (blue spectra in figure 5(a)). On the whole, patients see a larger contribution from the red spectra in their decomposition, while healthy participants see a larger contribution from the blue. Likewise, the correlation seen in 4(a) demonstrates that as patient disease severity worsens, their contribution from the red spectra will increase and blue spectra will lessen.
In order to gain a clinical interpretation of what EIM data are measuring, these spectral patterns are fit to a lumped circuit model using the LT-Spice simulator. Figure 5(b) shows this circuit, which was built to represent a Mann-Whitney U test pvalue for separation between mild and severe patients; Spearman correlation results for correlation with the overall symptom score. Shown for the three NTF modes, the combined tensor EIM and the raw data L2 norm and best single frequency. simplified muscle cell structure. Both the healthy impedance spectra and the ALS impedance spectra are fit to this model (see appendix, figure A4 for model fit and parameters). The circuit parameter changes observed when moving from the healthy to diseased mode are characterised by an increase in extracellular resistance and a decrease in intracellular resistance and capacitance. These changes are in keeping with the known effects of muscle atrophy (see discussion).

Discussion
In this paper we have developed an approach to analysing high dimensional EIM data, which we term tensor EIM. The tensor EIM metric both separates patients from healthy participants and correlates well with clinical symptoms. Tensor EIM encapsulates all the information from the impedance dataset and captures the intrinsic  symmetry of the impedance data, the compression ratio of 19:1 implies a high amount of redundancy in the dataset. Hence, NTF can significantly reduce the dimensionality of the data without losing important information. Further, the capacity of NTF to deal with missing data makes it opportune for the use of analysis in clinical trials, where it is possible that some recordings will be unsuccessful. While the classification performance of wrapper feature selection outperforms tensor EIM in the patient versus healthy classifier, it is important to consider the limitations of applying an exhaustive search feature selection on the dataset. These include it being a computationally expensive process and having a tendency for overfitting (Loughrey and Cunningham 2004) or missing potentially insightful features of the dataset. Despite using a nested cross validation to reduce overfitting, analysis of the features selected (see appendix, table A3) shows a large amount of inconsistency in the features, with most only appearing 25% of the time (i.e. in one of the four folds of the cross-validation). This suggests the feature selection is generally overfitting to each fold of the data and overall does not make a definitive decision on which features are best. Similarly selection of a best frequency for significance testing is also prone to overfitting, even when FDR correction is applied. By contrast, the tensor EIM process incorporates the information of the full dataset and consistently outperforms classification with the full set of raw features. In addition, unlike wrapper feature selection, tensor EIM is an unsupervised method, which identifies patterns in the dataset without bias.
This analysis considers a high dimensional EIM dataset over a broader range of frequencies than most previous limb studies (Rutkove et al 2007, 2017, Schwartz et al 2015. In the case of limb impedance recordings low frequency measurements can be unreliable and corrupted by artefacts, since keratinised skin causes measurement errors below 10 kHz. However, on mucosal surfaces such as the tongue, where there is no keratinised epithelial layer, low frequency impedance assessment can be implemented successfully (Lackovic and Stare 2007, Lackovic and Stare 2007, Murdoch et al 2014. The use of low and high frequency EIM data in tandem has advantages such as capturing information of the sizes of both the intra-cellular and extra-cellular fluid spaces (Dittmar 2003). By repeating the analysis on a truncated frequency range using data 305 Hz we have demonstrated that our low frequency data are reliable and do not dominate or skew results. Comparison of the performances between using the full frequency range (sections 3.1 and 3.2) and the truncated range (appendix, figures A2 and A3) suggests a small improvement when including the lowest frequencies. This could be tested on a larger patient cohort for validation, in order to make a full assessment of the additional value of including the low frequency data. In addition to the effects on performance, using a larger range of frequency inputs provides more information to improve the accuracy of lumped circuit models.
One considerable benefit to this type of analysis is the improved interpretability, since the procedure details the contributions of different spectral shapes across the cohort, which can be fed into lumped circuit models. The consistent trend when moving from healthy to diseased spectra was identified as a shift in the centre frequency to the right, an increase in magnitude at low frequencies and a smaller phase at high frequencies.
Using lumped circuit modelling these spectral changes are consistent with the histological changes associated with muscle atrophy and denervation. These include an increase in extracellular resistance due to remodelling of the extracellular space with connective tissue and fat (Jenkins et al 2018), a reduction in membrane capacitance due to muscle cell atrophy (Al-Sarraj et al 2014), and a smaller decrease in intracellular resistance due to increased cell membrane permeability and intracellular ionic content (Cisterna et al 2020).
In bulbar disease in ALS, a mixture of clinical signs related to both upper and lower motor neurone loss are typically seen (Tomik and Guiloff). At present there are only a few studies which examine EIM changes in upper motor neurone conditions. EIM assessment in stroke patients demonstrate a reduction in phase angle at 50 kHz (Li et al 2017) and 100 kHz (Hu et al 2019), this is in agreement with changes observed in ALS (Rutkove et al 2007). Further, studies of stroke related skeletal muscle changes report myofiber atrophy, which include a reduction in muscle area and volume, decreasing bilateral muscle fibre size and a reduction in the number of motor units . This suggests that perhaps both upper and lower motor neurone degeneration cause similar changes to a patientʼs EIM, however further work is needed to assess these changes in the multi-frequency EIM spectral pattern.
When assessing the correlation of our impedance metrics with the overall symptom score we must appreciate that current biomarkers for disease are subjective and lacking in sensitivity (Rutkove 2015). So, while a perfect correlation is not expected, the optimal correlation found with tensor EIM is promising. When comparing the correlations with the overall symptom score (figures 4(a)) to the correlations with the ALSFRS-R bulbar subscore (appendix, table A2), it is clear that the overall symptom score generally correlates better. Combining all known symptom information into one score should arrive at a more holistic appreciation of the disease state; the better correlation of EIM features with such a score offers encouragement that these are capturing a more complete representation of the disease state.
Identifying the spectral differences between healthy and patient muscle is key to understanding impedance changes with progressing disease. The use of impedivity, also termed VPCs (Sanchez et al 2020, , means that the disease associated spectral patterns may be applicable to other impedance systems and, perhaps, other muscles. As a starting point, an examination of longitudinal changes will determine if similar patterns emerge in the change of the impedance spectra with time, with the potential for determining a metric which sensitively monitors disease progression.

Conclusion
In this paper we demonstrate that the features output from tensor EIM can yield a high performance in disease classification and show significant correlation with disease severity. The procedure identifies the common spectral shapes in the healthy cohort and in patients, which allows the key spectral characteristics of disease to be determined. Applying lumped-circuit modelling demonstrates changes to the electrical properties that are consistent with what one would expect following muscle atrophy. Tensor EIM gives the best correlation with symptoms and has benefits including reduced likelihood of overfitting, the ability to deal with missing data and physically interpretable outputs. Overall, tensor EIM captures the spectral structures associated with disease and provides a simple quantitative output that may be useful in further developing EIM as a biomarker for ALS.    Figure A4. LT-Spice lumped circuit fit results for the healthy spectral pattern and ALS disease spectral pattern.