Unraveling Amazon tree community assembly using Maximum Information Entropy: a quantitative analysis of tropical forest ecology

In a time of rapid global change, the question of what determines patterns in species abundance distribution remains a priority for understanding the complex dynamics of ecosystems. The constrained maximization of information entropy provides a framework for the understanding of such complex systems dynamics by a quantitative analysis of important constraints via predictions using least biased probability distributions. We apply it to over two thousand hectares of Amazonian tree inventories across seven forest types and thirteen functional traits, representing major global axes of plant strategies. Results show that constraints formed by regional relative abundances of genera explain eight times more of local relative abundances than constraints based on directional selection for specific functional traits, although the latter does show clear signals of environmental dependency. These results provide a quantitative insight by inference from large-scale data using cross-disciplinary methods, furthering our understanding of ecological dynamics.

and almost everything in between (e.g. near-neutral, continuum or emergentneutral [1,2]). Most models are based on prior assumptions of processes that drive community dynamics. The Maximum Entropy Formalism (hereafter called MEF), however, makes no such a-priori assumptions in generating predictions of distributions, including those of species abundances [3]. It is a useful construct to infer processes driving community dynamics given the constraints imposed by prior knowledge (e.g. functional traits or summed regional abundances), as it does not include any bias introduced by potentially unjustified assumptions. Quantifying the relative importance of these distinct constraints can thus provide additional answers to understand the complexity of community dynamics (see Supporting Materials SM: boxes S1-S3). This is especially so because, although many different tests are available that link variation in taxon abundances to 1) trait variation, 2) taxon turnover between habitats or environments and 3) the distance decay of similarities between samples, none quantify the importance of these relative to each other. The MEF as applied here, however, is capable of and designed to do exactly this by decomposing variation to separate information explained by each of these aspects in a four-step model ( Fig. 1 and Box S2). Its application to an unprecedented large tree inventory database on genus level taxonomy consisting of > 2,000 1-ha plots distributed over Amazonia [4] and a genus trait database of 13 key functional traits representing global axes of plant strategies [5] allows us to advance the study of Amazonian tree community dynamics from a new cross-disciplinary perspective.

Results
Principles from information theory [6][7][8] can be used in an ecological setting to predict the most likely abundance state for each taxon while simultaneously maximizing entropy based on constraints. Maximization of entropy allows quantifying the information yield for each constraint and therefor identifies which constraints reduce entropy the most. Here we specifically use Shipley's mathematical framework (CATS) for the MEF calculations, similar to earlier studies [9][10][11].

Predictive power of the four-step model
Using a uniform prior and both CWM (Community Weighted Means) and CWV (Community Weighted Variance) constraints accounted for 23% on average of total deviance between observed and predicted relative abundances (measured by R 2 KL values, see Box S2 equation 5). Filtered by forest type this was 36% for podzol forests, várzea 25%, igapó 23%, swamp forests 34%, 24% and 21% for Guyana Shield and Pebas terra firme respectively and 20% for Brazilian Shield terra firme forests (see Table S1 for detailed decomposition). Using observed metacommunity relative abundances as prior regardless of CWM or CWV values accounted on average 58% for the combined dataset with all forest types between 50 and 60%, except for the Guyana Shield terra firme with 63%. Including both trait constraints and the metacommunity prior performed slightly better for the combined dataset (average 62%), with a minimum of 56% for igapó forests and a maximum of 66% for the Guyana Shield terra firme forests. To compensate for spurious relationships between regional abundances and local trait constraints, regardless of selection, explanatory power was regarded relative to model bias yielding the pure trait and metacommunity effects (Box S3, Fig. 2 and Table S1). This lowered the proportion of information accounted for and yielded average pure metacommunity effects of 43% for the overall dataset ranging between 30 and 48% for each forest type separately with pure trait effects explaining only 5% of information for the combined dataset on average with for each forest type between 3 and 8%. Although the latter was lowered substantially, the explanatory power did appear to be strongly dependent on forest type. The online supplementary material provides additional results relating to the predictive power of each model as well as the spatial gradient of the pure trait and metacommunity effects (Figs. S2-3).

Direction and strength of selection of trait-based constraints
Each trait showed significant differences in lambda when compared between forest types (

Effect of regional metacommunity prior
There was a remarkable similar mean 21% decrease of the information explained purely by the metacommunity prior for each forest type (Fig. S3). It should be noted there is an obvious risk that when sampling size is increased, this also includes more environmental heterogeneity as samples are coming from a variety of localities potentially leading to changing composition. If this were the case, however, the regional prior (qi from Fig. 1 and Box S2) would also change, as taxa might be abundant in some places but rare or absent in others. As the metacommunity effect is the explained information that remains relative to any trait effects (i.e. information unique to the neutral prior) and the pure trait effects are the explained information remaining after correcting for pure metacommunity effects (Box S3) this effect should then be accompanied by an increase in pure trait effect for each sample. This was not observed, not even within the different forest types. Instead, the trait effect gradually went up and then remained constant (Fig. S4).

Discussion
From an ecological point of view, the MEF can be used to quantify the relative association between directional or stabilizing selection for functional traits versus the importance of relative regional abundance regardless of these traits by imposing these as constraints. Our results show that pure trait effects, on average, explained only 5% of the information when all forest types were taken together whereas the pure metacommunity effect, however, explained almost ten times more with an average value of 43%. Greater trait dissimilarity was positively associated with higher pure trait effects, indicating trait-based selection, although the assumed influence of dispersal regardless of these traits appeared to confer more information explaining Despite showing clear patterns in environmental selection and dispersal effects, there was a large proportion of information left unexplained (44% on average). Potentially, local demographic stochasticity could weaken any link between functional traits measured and regional abundances of genera. This would, however, mean that almost half of the information contained in relative abundances are the result of random population dynamics and are not structurally governed. Alternatively, this could be due to functional traits reflective of processes not taken into account in this study, such as traits reflective of interactions between trophic levels (e.g. specific plantpathogen interactions). Another and at least equally likely hypothesis for (local) unexplained information is that when scaling up, the ratio of genus richness to total abundance decreases rapidly initially but stabilizes again as relatively nonoverlapping habitats are included in the regional abundance distributions and more genera are included again due to the different habitats. This would result in a change of the regional abundance distribution (i.e. the prior) to which each local community is compared, resulting in higher local unexplained information. Further study into these aspects could provide additional insight, though the data necessary for these scales is lacking for Amazonian trees.
Although the initial explanatory power of the metacommunity prior differed between forest types, the decay pattern was very similar. As the effects of either traits or the metacommunity are measured in the goodness-of-fit predictions on local relative abundances, this implies that at small spatial scales the surrounding regional abundances provide better estimators than functional traits, while at larger spatial scales this shifts to the traits. The ecological translation would be that on small spatial scales, local communities share similar environmental conditions leaving dispersal and drift acting predominantly in changing community composition, at least for genus level taxonomy. As the potential regional pool is increased, more and more environmental heterogeneity and non-overlapping regions are likely to be introduced.
The more gradual decline of terra firme forests can then arguably be attributed to these forests having the largest relative surface area of Amazonia (even for the separate subregions), potentially giving these forests an almost continuous metacommunity without gaps, resulting in a more gradual transition from metacommunity to trait relative importance. The fact that metacommunity effects do not change anymore after certain distances would indicate the effect of dispersal potentially occurs over very large distances. It should be noted that as these calculations are done at community and genus level, they do not measure single dispersal events but rather the effect of dispersal on community composition much deeper in time. In other words, this effect suggests more than a dispersal event every now and then. Instead, it argues for prolonged mixing of forests on large geographical and temporal scales, supported by recent findings demonstrating a lack of geographical phylogenetic structure of lineages for Amazonian tree genera [12].
Using an unprecedented scale of data and applying the Maximum Entropy Formalism from information theory we show that constraints formed by regional relative abundances of genera explain almost ten times more of local relative abundances then constraints based on either directional or stabilizing selection for specific functional traits, although the latter does show clear signals of environmental dependency. There is, however, still much to be explored due to the large unexplained effects and analyses on finer taxonomic (i.e. species level) and environmental (e.g. microhabitat) scales could resolve these issues. The relatively large effects of the regional pool of genera over great distances does suggest an important role for long term dispersal and mixing of Amazonian trees, especially for the Amazonian interior.

Empirical data
The ATDN [4] consists of over 2000 tree inventory plots distributed over the Amazon basin and the Guiana Shield, collectively referred to as Amazonia (a map of all current plots can be found at https://atdn.myspecies.info/). Only those plots with trees  [15] for details regarding these forest types). Trait data were extracted from several sources. Wood density was mostly derived from [16]. Traits related to leaf characteristics mostly came from four large datasets [17][18][19][20][21][22] , including additional data from other sources [23][24][25] [26][27][28] as well as different flora's and tree guides. As this particular trait can vary over several orders of magnitude, this was included on a log-scale [27,29].
Traits involved in aluminum accumulation were based on [33,34] and references therein. For binary traits (yes/no), a genus was considered having a certain trait only when >50% of the genus was positive for that specific trait.

Functional traits and trait imputation
Constraints were formed by community weighted genus means (CWM) and genus variance (CWV) of functional traits (Table 1), related to key ecological life history aspects on which natural selection potentially operates. According to principles of natural selection, CWM values will be biased towards favourable trait values for that particular environment in the case of directional selection, as taxa with these traits will be more abundant due to environmental selection while stabilizing selection would decrease CWV values within genera. For many traits it has been shown earlier that the interspecific variability was larger than the intraspecific variability, allowing the use of data from different sources to at least calculate a mean species trait value.
Genus trait values were computed as genus-level mean or variance of species values if known within the genus and considered constant for each genus. Genus level of taxonomy was used as the available trait database had the most information on this taxonomic level (see Table 1). Unknown values for traits were estimated by Multiple Imputation with Chained Equations (MICE, see [15]) by delta adjustment, subtracting a fixed amount (delta), with sensitivity of this adjustment to the imputations of the observed versus imputed data analysed using density plots (Fig. S8) and a linear regression model. This procedure was done using the mice package [35], available on the R repository, under predictive mean matching (pmm setting, 50 iterations). Results showed imputations were stable and showed near identical patterns with each imputation scenario (see Figs S5-6 and Table S2).
After imputation, all trait values were transformed to Community Weighted Means with ra the relative abundance of the i th genus in the k th plot, using either genus trait mean or variance of traits over all species within the genus for the CWM and CWV respectively. Figure 1 provides a schematic procedure overview, box S1 provides an overview of important terms and Boxes S2-3 further mathematical details. Initially, a maximally uninformative prior is specified, where qi (Box S1 equation 1) equals 1/S, indicative of each species having equal abundances, and trait constraints are randomly permuted multiple times among genera to test whether inclusion of specified constraints significantly changes derived probability distributions (see also [37]). Subsequently, the same prior is used but now observed trait CWM or CWV belonging to specific genera are used as constraints (following earlier applications using simulated communities [11]). Third, observed regional abundances are used as prior with permutated trait constraints and finally both observed regional abundances and observed trait CWM/CWV are used as prior and constraints. Maxent2, an updated version of the maxent function currently in the FD library of R provided the computational platform. Proportions of uncertainty explained by each model are given by the Kullback-Leibler divergence R 2 KL, a generalization of the classic R 2 goodness of fit. In contrast with standard linear regression models having squared goodness-offits measurements, the R 2 KL is much more related to the concept of relative entropy, quantifying the information lost when one distribution is compared to another by means of quantifying the statistical distance between two distributions [38]. Pure trait, pure metacommunity, joint metacommunity-trait and unexplained effects are calculated as proportions of total biologically relevant information (Box S1 and Box S2). Data was rarefied to smallest sample size (swamp forests; 28) and calculations bootstrapped 25 times. Results indicated no significant change compared to using all data, hence the total dataset was used for all analyses.

Strength and direction of selection
Predictions of genus relative abundances are computed as a function of traits reflected in the CWM or CWV values and a series of constants (λjk: the Lagrange Multipliers).
Each multiplier quantifies the association between a unit of change for a particular trait j and a proportional change in predicted relative abundance pik (the i th genus in the k th community) considering all other traits are constant, formally described as:

Estimation of metacommunity size
Iteratively increasing the regional species pool considered as prior in concentric circles of a fixed radius of 50 km allows estimating the spatial effect of metacommunity size. The relationship between pure metacommunity effect and radius of metacommunity size was fitted using a smoothing loess regression (function loess and predict; R-package stats with span set at 0.1). Fits subsequently were used to predict values of metacommunity effect based on geographical distance to visualize general patterns for each forest type. Exponential decay of pure metacommunity effect was described using a self-start asymptotic regression function (SSasymp) of the form y(t)∼yf+(y0−yf)e −exp(log(α))t (nls from stats, [39]. A list of all packages used in R in addition to those preloaded can be found in the supplementary online material (SA2).     Whiskers extends from hinge to largest or smallest value no further than 1.5 * IQR from hinge. Points beyond this range are plotted individually.

BOX S1
Box S1. Different ingredients necessary for analyses using MEF. Definitions of the most important terms used in the MEF analyses and throughout the main text to provide the necessary framework of understanding, adapted from [48].

Entities:
The basic unit of the MEF model which can exist in different states. Here, this constitutes a collection of genera existing at a site, hence each entity can be considered a single genus.

States:
Classification of different ways any entity can exist. Within the system, states of each entity describe their specific abundance at that site. Microstates constitute the spatial and temporal composition for the states of the entities in the system. Macrostates depict the entities of a system independent of the spatial or temporal composition, e.g., the overall relative abundance distribution but not including processes leading to this distribution (such as dispersal).

Traits or properties:
The

Community-weighted mean or variance:
The mean or variance of genus-level trait value over all constituent present species (for each entity) weighted by the relative abundance of each entity at a specific site.
The Maximum Entropy Formalism as applied here works based on a conceptual model called CATS (Community Assembly by Trait Selection [9][10][11]) and makes use of three inputs: i) A trait matrix containing the measured functional traits of each of the S total genera in the total regional pool, these can be of either discrete or continuous form.
ii) A vector of n community weighted trait values, estimating the average trait value over all individuals in the local community for each of the traits iii) A prior probability distribution specifying the regional abundance distribution, quantifying potential contributions of the regional pool of recruits to the structure of local communities.
Using these three sources of information, the model predicts relative abundances (pi) in the form of Bayesian probabilities for each genus in each local community without assuming any a priori relations or processes. This is achieved by finding the vector of relative abundances maximizing entropy: with qi the regional species pool abundance of species i and RE (Relative Entropy) subject to the known constraints for j traits and i species.: The solution is a generalized exponential distribution where the λ values measure the importance of each trait when all other traits are constant: Note that when all λ values are zero, i.e. there is no trait based selection, pi = qi The final step is to measure the proportion of total deviance accounted for between observed and predicted relative abundances for each of the fourstep solution. These are the R 2 KL values, a generalization of the classic R 2 index of maximum likelihood estimation using the Kullback-Leibler index [20]: i) 02 KL(u): fit of model bias, the model null hypotheses given a uniform prior (i.e. equal distribution in the regional pool of recruits).
ii) R 2 KL(u, t): fit using again a uniform prior but including traits as constraints.
iii) 02 KL(m): fit using the metacommunity prior but excluding traits as constraints iv) R 2 KL(m, t): fit using the metacommunity prior and including traits as constraints The general form of the R 2 KL divergence is calculated by:

With the following parameters:
Oik as the observed relative abundances of the i th genus in the k th community, Pik the accompanying predicted values for the specific model of the four solution step as described in the main text and, Qi,0 the predicted relative abundances given only the maximum uninformative prior.
Further details on the calculation of all separate R 2 KL values and accompanying pure trait, pure metacommunity, joint information and biologically unexplained information can be found Box S3.

Box S2. Mathematical description of the Maximum Entropy Formalism for the four-step solution. Left panel shows necessary ingredients and formulation of the Maximum Entropy Formalism.
Right side panel shows decomposition of the proportion of total deviance accounted for between observed and predicted relative abundances for each of the four-step solution, adapted from [9].
The purpose of using MEF is to decompose the deviance between observed and predicted relative abundances using the four-step solution as described in the main text. The values generated are described below. The R 2 KL value is a generalization of the classic R 2 index of maximum likelihood estimation using the Kullback-Leibler index for a non-linear regression including a multinomial error structure [9,10,38]. In essence, it is a way of measuring the proportion of total deviance accounted for by that specific model from one of the four steps: '2 KL(u): fit of model bias, the model null hypotheses given a uniform prior and permuted traits R 2 KL(u, t): fit using a uniform prior but including observed traits as constraints '2 KL(m): fit using the metacommunity prior but excluding observed traits as constraints R 2 KL(m, t): fit using the metacommunity prior and including observed traits as constraints 1) The increase in the explained deviance due to traits can be calculated either by Increase in explained deviance due to traits beyond that due solely to model bias or ΛR 2 KL(t|m) = R 2 KL(m, t) -'2 KL(m) Increase in explained deviance due to traits beyond contributions made by the meta-community 2) The increase in explained deviance due dispersal mass effects via the metacommunity can be calculated by either: Increase in explained deviance (if any) due to the metacommunity beyond that due to model bias or ΛR 2 KL(m|t) = R 2 KL(m, t) -R 2 KL(u, t) Increase in explained deviance due to the meta-community given traits, relative to the explained deviance due only to the traits: i.e. information unique to neutral prior  Pebas terra firme (TFPB) and várzea (VA). Differences were tested with a one way analysis of variance with significance levels corresponding to: ns non-significant, * p < .05, ** p < .01 and *** p < .001. Traits used were wood density (WD), seed mass class (SMC), specific leaf area (SLA), nitrogen (N), phosphorus (P) and carbon (C) leaf content with the prefix of VAR for the variance of continuous traits. Latex, Resin, Nodules, Ectomycorrhiza (EctoMyco), the ability to accumulate aluminium (AlAcc), and the presence/absence of fleshy fruits (Fleshy) and winged seeds (Wings) are all binary traits.     of total information reflective of variation in local relative abundance explained for by the various models. Middle rows indicate the specific information gain from any one of the used models relative to the model bias. Bottom rows show the actual effects of traits, the metacommunity and the joint information relative to the model bias.

S-A Ecological interpretation of the MEF results
A number of functional traits associated with low nutrient conditions (e.g. ectomycorrhiza) and life history strategies suited for protection against herbivores (e.g. latex and high leaf C content) were clearly positively associated with abundance in nutrient poor environments (podzols) in terms of community weighted mean values, indicated by the positive lambda values. In contrast, community weighted means for fleshy fruits and high leaf N and P content were negatively associated with abundance on these soils. Nodulation was also negatively associated with abundance on poor soils, supporting earlier results [49]. The ability to accumulate aluminium was positively associated with abundance on soils commonly associated with higher aluminium content such as igapó (strong positive effects) and terra firme soils (a minor, yet positive effect). In contrast, it was strongly negatively associated with abundance for podzol, várzea and swamp forests. Traits such as SLA or winged fruits also showed strong patterns dependent on forest type.
Signals of quantitative environmental selection were found to be highest for podzol forests, whereas its counterpart in the form of the dispersal mass effect from the regional pool of genera had the lowest value. Podzol forests, having extremely nutrient poor soils could reflect a much stronger selective environment than any of the other forest types. Terra firme forests, presumably reflective of a less strong selective environment in terms of resource availability, showed the opposite, with less than half of the pure trait effect in comparison with podzol forests (even when rarefied to accommodate for different sample sizes). Traits associated with protection against herbivores such as latex [43] and high leaf carbon content showed higher values associated with greater abundance and overall lower variance on podzol soils, whereas traits indicative of investment in growth and photosynthetic ability such as high foliar concentrations of P and N [50] showed strong negative associations on nutrient poor soils for both community weighted means and variance. The ability to accumulate aluminium was also strongly positively associated with relative abundance on the more nutrient but also often aluminium enriched soils of terra firme and in some cases aluminium rich igapó forests. Lambda values also showed strong negative lambda values for wood density in swamp and forests in both community weighted means and variance, fitting high tree mortality and many individuals belonging to pioneer species in especially the western Amazonian swamp forests. Várzea and Pebas terra firme forests showed a similar response. As the Pebas consists mainly of Andean sediments it has higher nutrient content, promoting lower wood density, supported by our results whereas várzea forests are also often flooded. There were also traits that showed no specific (strong) signal of selection on certain forest types (either positive or negative), such as latex on igapó and ectomycorrhiza on várzea (see Fig. S1 for all lambda values). Plotting lambda values for CWM and CWV constraints of the continuous traits showed WD and C were both strongly positively correlated indicating strong directional selection for lower trait values accompanied by a reduction in trait variance. SMC, SLA and leaf P content, however, showed a negative correlation with higher lambda values for CWM values associated with lower trait variance ( Figure S10). None of the traits showed a reduction in variance without a change in the CWM, suggesting directional selection is more likely than stabilizing selection, even though the overall information yield remains low. Interestingly, terra firme forests in general showed the smallest lambda values overall (positive or negative).
This may be indicative of either more pronounced demographic stochasticity or ecological drift eliminating the association between traits and relative abundance. Lower effects of selection in general or more (random) variation due to the larger species pool in comparison with other forest types, however, could also be the result of mixing heterogeneous microenvironments into a single environmental class. Support for such heterogeneity within terra firme forests having influence on distribution of functional traits on valleys or plateaus has recently been found [51]. In addition, natural but also anthropogenic [52] disturbance history affects biotic community composition and can lead to changes in tree community through time, blurring relationships between traits and relative abundances. It should further be noted that, although for terra firme forests we were able to make a distinction by subregion, true within forest type heterogeneity was not taken into account. This might cause an underestimation of the deterministic effect but as of yet cannot be corrected for on this scale and is worth to be investigated in future studies. In addition, podzol forests have a smaller connected surface area and accompanying smaller number of genera in comparison with terra firme forests, adding to the calculated stronger trait effects [53,54]. When more detailed understanding and knowledge of these functional traits would be provided, this would most likely increase the explanatory power of the MEF. The fact, however, that we do not have a very specific knowledge of these interactions and specific traits is precisely the reason why the MEF can provide additional insight.
It should be noted that for species level analyses any micro environmental gradients might prove to also show (stronger) selection at local scales [55,56], as it has been shown that most variation in community composition, due to selection in regard to habitat filtering and niche conservatism, is found at lower taxonomic levels, such as between species within genera [57,58]. In contrast, theoretically it has been shown and tested that immigration numbers are actually very robust across taxonomic scales [59], validating our results of the metacommunity importance using genus level taxonomy. Spatial patterns of metacommunity effects showed shallowest declines in the centre, supporting the suggestion that high diversity of the Amazonian interior could be explained by influx of recruits due to large (overlapping) ranges. This middomain effect [60], however, would also predict lower species richness for the edges due to lower range overlap, assuming a closed community. This is not the case, as there is a strong species richness gradient from West (rich) to Eastern Amazonian forests (poor) [61]. The lower metacommunity effect for the edges then is most likely not due to less absolute influx of genera, but rather less influx from the Amazonian tree community. Influx from the species-rich Andes could account for the high diversity [62], yet low Amazonian metacommunity effect for Western Amazonian forests. In contrast, Southeastern parts of Amazonia receive influx from tree speciespoor biomes (i.e. the Cerrado) resulting in lower diversity but also low metacommunity effect for Amazonian trees in this region.