Assessing the predictive power of step selection functions: How social and environmental interactions affect animal space use

The ability to predict animal space use patterns is a fundamental concern in changing environments. Such predictions require a detailed understanding of the movement mechanisms from which spatial distributions emerge. However, these are typically complex, multifaceted, and therefore difficult to uncover. Here, we provide a methodological framework for uncovering the movement mechanisms necessary for building predictive models of animal space use. Our procedure begins by parametrising a movement model of each individual in a population using step selection analysis, from which we build an individual‐based model (IBM) of interacting individuals, derive predicted broad‐scale space use patterns from the IBM and then compare the predicted and empirical patterns. Importantly, discrepancies between these predicted and empirical patterns are used to formulate new hypotheses about the drivers of animal movement decisions and thus iteratively improve the model's predictive power. We demonstrate our method on a population of feral pigs in Mississippi, USA. Our technique incorporates both social interactions between individuals and environmental drivers of movement. At each iteration of model construction, we were able to identify missing features to improve model prediction by analysing the IBM output. These include overuse‐avoidance effects of self‐attractive mechanisms (i.e. attraction to previously visited sites becomes repulsion if there have been multiple visits in quick succession), which were vital for ensuring predicted occurrence distributions do not become vanishingly small. Overall, we have provided a general method for iteratively improving the predictive power of step selection models. This will enable future researchers to maximise the information obtained from step selection analyses and to highlight potentially missing data for uncovering the drivers of movement decisions and emergent space use patterns. Ultimately, this provides a fundamental step towards the general aim of constructing predictive models of animal space use.


| INTRODUC TI ON
Understanding the space use of animals is a primary concern of ecological research (Franklin, 2010), giving insights into both behavioural features of individuals (Vázquez Diosdado et al., 2018) and the demographic dynamics of populations (Pagel & Schurr, 2012). From an applied perspective, space use underpins important management actions such as the design of conservation areas (Macdonald & Rushton, 2003) and mitigating the effects of biological invasions . Indeed, such applications require more than just an understanding of space use but also tools for predicting spatial distributions under hypothesised future scenarios (Wood et al., 2018).
Often, such predictions make use of statistical modelling tools such as species distribution models (SDMs; Zimmermann et al., 2010) and Resource Selection Functions (RSFs; Manly et al., 2002), which aim to uncover the environmental features that co-vary with animal space use. One can then use these correlations to predict future space use either in a different part of the globe (e.g. when assessing the possibility of a future biological invasion) or in the same location as the environmental features change (e.g. when making conservation decisions). A great deal of recent research effort has gone into improving both the quality of inference and predictive power of SDMs and RSFs, but despite these efforts, the predictive power of such statistical models still tends to be of variable quality (Hao et al., 2019).
A shortcoming of SDMs and RSFs is that they do not usually explicitly model the requirement of animals to move from their past distributions, which are being used to parametrise the model, to their predicted future distributions. Whilst there has been increasing effort in the SDM literature to account for dynamic aspects of landscape , SDMs that include animal movement are rarer (Holloway & Miller, 2017). Furthermore, they usually only model movement implicitly through summary statistics such as dispersal limitations or migration rates (Miller & Holloway, 2015). In reality, the detailed movement mechanisms that give rise to broaderscale features, such as home ranges, dispersal, or migration, may have a large effect on the ultimate relocation of animals adapting to environmental change (Nathan et al., 2008). For example, there may be physical barriers to movement between past and potential future locations (Beyer et al., 2016). Moreover, even without physical barriers, there may be a time-lag due to the animal having to adapt and re-position themselves in space to make best use of their new environment. In other words, anthropogenic changes can cause ecosystems to be in a transient state (Hastings et al., 2018). Indeed, there is growing evidence that transients are the norm in the Anthropocene (Morozov et al., 2020).
Accounting for ecological transience thus requires a dynamic approach via mechanistic modelling (Francis et al., 2021).
Mechanistic models are those that operate with (at least) two levels of description. The first is the mechanism (or process), from which the model is built. In our case, these are the movement decisions of animals. The second is the pattern, which describes the emergent features of the model. In our case, this is any summary statistic that describes animal space use over some extended period of time (e.g. a week, month, season, or year). Since such mechanistic models are dynamic, they can account for transient space use patterns (Morozov et al., 2020). Moreover, their dynamism enables the modeller to incorporate effects of non-linear feedbacks between locations of different individuals and populations (e.g. due to social interactions), which may have a non-trivial effect on the distribution of species (Potts & Lewis, 2019). Consequently, mechanistic modelling of animal space use, that is using movement models to understand spatial distributions, is becoming increasingly popular (Merkle et al., 2017;Michelot et al., 2019;Signer et al., 2017). Some of the earliest efforts involved modelling home range and territory formation using partial differential equations (PDEs) built from movement processes , which were shown to have good predictive power for ascertaining changes in territorial structure . Whilst initial efforts focused on canids, these models have since been extended for use with other territorial (Bateman et al., 2015) and non-territorial species (Ellison et al., 2020). These models have proven useful for uncovering a variety of drivers of space use patterns, including social interactions (Moorcroft et al., 1999), kin-relatedness (Ellison et al., 2020), food distribution (Bateman et al., 2015), and topographical effects .
One way of parametrising such mechanistic models is to fit the emergent space use distributions to relocation data . However, there is an implicit assumption that the locations are independent samples of a steady-state distribution. This can be problematic for two reasons: (a) locations are never actually independent and ascertaining the effect of this assumption can be tricky (Ellison et al., 2020), and (b) space use distributions may not be in a steady state (Bateman et al., 2015). Consequently, many recent attempts at mechanistic modelling of space use have focused on parametrising models from the movements between successive locations, often known as the 'step' between one measured location and the next Potts & Schlägel, 2020;Signer et al., 2017). This replaces the independence assumption with a Markov assumption such that each location is assumed to be dependent on either the previous location (to incorporate locational autocorrelation) or the previous two locations (incorporating both locational and directional autocorrelation). Such parametrisation can be performed using the technique of step selection analysis (SSA) or one of its variants (Avgar et al., 2016;Forester et al., 2009;Fortin et al., 2005;Munden et al., 2021). This technique is both well-established and relatively user-friendly, especially owing to R packages such as amt, which K E Y W O R D S animal movement, home range, individual-based model, movement ecology, resource selection, spatial ecology, step selection, utilisation distribution simplify work-flow (Fieberg et al., 2021;Signer et al., 2019). Therefore, it seems a promising way forward for parametrising mechanistic models of space use (Potts & Schlägel, 2020).
As well as parametrising PDE models, many studies have found it favourable to use either integro-difference equations (IDEs) or stochastic individual based models (IBMs) instead (Potts, Bastille-Rousseau, et al., 2014;Signer et al., 2017). The main attraction of PDEs arises from the wealth of analytic tools to understand their pattern formation properties, which are not present with IDEs or IBMs. These analytic tools enable one to ascertain qualitative features of space use without recourse to extensive simulation analysis (Potts & Lewis, 2019). However, constructing PDE models of movement often requires one to take continuum limits, moment closure assumptions, and/or mean field assumptions, which may not be reasonable in all circumstances (Wang & Potts, 2017). Furthermore, solving PDEs numerically can be technically challenging, being a research field in its own right (Ames, 2014). Therefore, approaches via IDEs and IBMs are attractive, especially for constructing mechanistic models of space use that have strong predictive power.
Here, we build on the general approach of using SSA to parametrise mechanistic space use models, but we focus on maximising the ability for our models to capture broad-scale spatial patterns and use an IBM approach for simulation. Our framework uses an iterative procedure of model improvement whereby the emergent space use patterns from an empirically parametrised IBM are compared with data to uncover missing aspects of the model (Figure 1). We focus initially on modelling social interactions between individuals, but also show how to incorporate environmental interactions. We demonstrate our framework via application to a population of feral pigs Sus scrofa in northwest Mississippi, USA. Pigs are a strongly invasive, gregarious and omnivorous generalist occurring in a wide variety of ecosystems and landscape types worldwide (McClure et al., 2015). Their diets are flexible and can include a variety of naturally occurring herbaceous vegetation and hard and soft mast (Quercus spp.; Ballari & Barrios-García, 2014). They also opportunistically consume vertebrate and non-vertebrate fauna when available (Ballari & Barrios-García, 2014). Feral pigs also consume, trample and damage agricultural crops (e.g. corn, soybean, potato and rice; Paolini et al., 2018), and predate deer and livestock neonates when available, causing significant financial loss and motivation for feral pig management. These pulsed resources of naturally occurring and anthropogenic sources of food lead to seasonal effects in feral pig resource selection (Paolini et al., 2018), with consequent downstream effects on seasonal home range size and structure (Paolini et al., 2019). This flexibility across seasons and landscape structures makes pigs an ideal candidate to evaluate the performance of our mechanistic framework.
Our broad aim is to make inroads into providing a general method for building predictive, mechanistic models of animal space use from a combination of animal tracking and environmental data. There are two steps to this. First, one needs to ascertain whether models parametrised from point-to-point movement data of animals can capture broader-scale space use patterns in the same dataset. This shows how good the model is at predicting the occurrence distribution at future points in time from data on the occurrence distribution at a previous time. Second, one needs to ascertain the extent to which these models can capture space use patterns in novel situations, using data different from those used to parameterise the model. Here, the analysis of our dataset will focus purely on the first question, for simplicity. However, the same techniques that we develop here could also be used to address the second question as well, given the right data. In Section 4, we will give some explanation of how this second step might be done.
Overall, our framework both gives a measure of the predictive capability of models fitted via step selection techniques and methods for ascertaining what might need to be included in these models to improve their predictive power. Thus, as well as having the potential to predict future space-use patterns, our methods can also be used to generate new hypotheses about the drivers of animal movement decisions, thereby enabling researchers to gather more information from their existing datasets. Vectronic Aerospace GmbH (n = 13)] and collected data at 2-hr refresh rates through January 2017. We removed all animal relocations with horizontal dilution of precision (HDOP) < 10 for a location accuracy of 3-7 m (Rempel & Rodgers, 1997), as well as all relocations F I G U R E 1 Framework. Flow-chart demonstrating our framework for building predictive models of animal movement. The blue letters refer to the steps labelled in Section 2.2. The process starts from the green box, in which one determines which variables to use in building the initial model. within 24 hr of collar deployment or mortality. Additionally, we used an algorithmic cleaning procedure to remove aberrant relocations based on the known movement capabilities of feral pigs. Feral pigs typically travel at 5 km/hr but are known to sprint up to 50 km/hr (Mayer & Brisbin, 2009), thus if the average velocity between consecutive fixes exceeded 20 km/hr, we removed the later location from the dataset on the basis that such speeds cannot be realistically maintained over that period. This procedure was repeated until each set of consecutive fixes returned an average velocity ≤ 20 km/ hr (Paolini et al., 2018). Of the 16 tagged pigs, we selected 12 animals for this analysis based on whether they came within proximity of another collared pig (i.e. their ranges overlapped). Since no other home ranges overlap these, we refer to this area henceforth as the 'study area' and the pigs therein as the 'study population'. We obtained land-

| Methodological framework
Our methodological framework involves a work process of iterative model improvement ( Figure 2) whereby we first (A) use step selection analysis (SSA) to parametrise an individual-based model (IBM) of animal movements and social interactions from data on comoving animals, then (B) simulate this model to produce predicted spatial occurrence distributions (ODs), which estimate where an animal is located over a particular time interval (Fleming et al., 2015).
Next, we (C) compare these ODs to the data to identify movement processes that may be missing from the IBM, then (D) modify the model based on the hypothesised missing movement processes identified in step (C). Finally, we return to step (A) to test whether there is significant evidence for the presence of these new hypothesised movement processes, and the entire process repeats. This broadly follows the 'Strong Inference' philosophy, dating back to Platt (1964), and still highly relevant in the context of ecological modelling (Burnham & Anderson, 2001;Ganusov, 2016). We give full details of each of these steps below, together with how we applied them to the pig data.
Step A: Estimate movement parameters from data using SSA.
Here, we parametrise a movement kernel describing the probability density function of animal i ∈ {1, … ,N} moving from location x to location z during a single time-step of length (where is constant; for the pig data, = 2 hr) with the following functional form: Here, i, (|z − x|) is the 'empirical step length distribution' (Forester et al., 2009), a step length distribution obtained by fitting a predetermined functional form (e.g. exponential or gamma) to the empirical step lengths. Also, |z − x| is the Euclidean distance between z and x. The vec- tures that are hypothesised to co-vary with the animal's choice of next location and the vector i = i,1 , … , i,n denotes the strength of the ef- is a normalising function, ensuring p i, (z|x, t ) integrates to 1 (so is a genuine probability density function) and Ω is the study area. Finally, we will choose an exponential step length distribution where 1 ∕ i is the mean step length of animal i, although one could pick any i in theory (Avgar et al., 2016;Fieberg et al., 2021). Then, the n+1 |z − x| term in Equation (1) corrects for any discrepancy between the empirical step length distribution and the resource independent step length distribution, as detailed in Forester et al. (2009).
So far, this is a relatively standard description of the movement kernel for SSA (Avgar et al., 2016;Forester et al., 2009;Fortin et al., 2005) and so the next task is to specify the functions Z i,j (z, t) . Since our aim is to understand the effect of the occurrence distribution (OD) of each animal on both itself and each other animal, we set Z i,j (z, t) to be the OD of individual j ∈ {1, … ,N} at time t (so that n = N), measured in units of km −2 . However, this requires specifying both a time-scale over which we measure the OD and a method for constructing an OD from locational data. Helpfully, the method of Schlägel et al. (2019) provides a way of constructing an OD for this very purpose, using a kriging technique. This technique finds the best-fit movement model from the suite of continuous-time movement models defined in the ctmm R package (Calabrese et al., 2016), via the occurrence function from that package. For the purpose of demonstrating our methods, we set T = 30 days to be the time interval over which each occurrence distribution is calculated. (Note that this assumes a very simple memory process whereby the cognitive map of an animal simply consists of their OD over the last 30 days. For better realism one could include more complex processes, e.g. Merkle et al. (2019), but this is a simple starting point for demonstrating our methods.) To fit the movement kernel in Equation (1) to our pig data, we follow a standard case-control design used in most SSA studies (Avgar et al., 2016;Forester et al., 2009;Fortin et al., 2005). For this, we sample from the kernel i, 10 times for each step (i.e. 1:10 use: availability) to give control locations and compare these to the measured locations to test for a significant effect of the variables Z i,j for i, j ∈ {1, … ,N}. This procedure of constructing the ODs using kriging and testing for their effect on animal movement using SSA is identical to that of Schlägel et al. (2019).
Step B. Implement IBM using parameter values from Step A. Since the movement kernel, p i, , for animal i depends upon the (1) (2) prior locations of all the animals (both animal i and the others), Equation (1) describes a system of coupled step selection functions . These can be analysed using a stochastic IBM of interacting agents, one agent for each animal.
Naïvely, one could simply simulate the model in Equation (1) for each animal simultaneously. However, this would require constructing the kriged OD for each animal at each time-step, since each animal's movement at each time step is determined by the ODs of every individual. Such a method would be highly computationally intensive and would make analysis prohibitively time-consuming, so we need a more efficient simulation framework.
For this, we use a nearest-neighbour movement process on a square grid with a relatively-large lattice spacing of Δx = 100m.
The advantage of this is that the OD at time t can be defined as simply the distribution of lattice points that have been visited in the time-period from t − T to t, similar to the model of Giuggioli et al. (2011). This requires no repeated calculation of a kriged OD, and so no requirement to fit a continuous-time movement model for each individual at each time step. Moreover, we will construct such a nearest-neighbour model that is formally related to Equation (1), inasmuch as the drift and diffusion functions are identical in the limit In this model, the probability of animal i moving from a lattice site at location x to another at location z, during a time-step Δt, is given by Here, C i (x, t) can be thought of as a normalising function, to ensure probabilities add to 1, and d encodes the diffusivity of the animal. For Equation (4) to describe genuine probabilities, f i,Δt (z|x, t ) must always be positive, so we need to choose Δt such that d i < 1. Notice that there are two time scales in operation here: , defined prior to Equation (1), representing the time step between consecutive locations in the data, used for step selection inference, and Δt representing the time step of the IBM.
To parametrise the IBM in Equation (4), we discard any of the Z i,j variables whose corresponding i,j -value was not significantly different from 0 (p > 0.01) in our step selection analysis from Step A, to avoid over-fitting. We then set i,j =̂ i,j whenever i,j is significantly we set i,j = 0. (However, to check whether our results are sensitive to the effect of discarding parameters, we also repeat our analysis but Supplementary Table ST2 Step C. Assess model fit and identify gaps. Next, we compare the resulting OD from each simulated pig (from 5th May to 3rd June inclusive) with the corresponding OD from the data for the same time period (inferred using the ctmm procedure described above), using Bhattacharyya's Affinity (BA). This is generally considered the best way to assess overlap of space use in an ecological setting (Fieberg & Kochanny, 2005). The BA gives a value between 0 and 1, where 1 corresponds to perfect agreement and 0 to no overlap.
Since the interaction model is stochastic (as with all our IBMs), it will return a different set of ODs each time it is run. Thus, we simulate our stochastic model 1,000 times and calculate the average BA for each pig. We perform the same stochastic simulation procedure on the null model, and compare the resulting BA-values of the null and interaction models, to check whether the interaction model is better at predicting space use than a random walk. When running repeated stochastic simulations, we always ensured that we averaged over sufficiently many stochastic realisations so that any claim we make of an increase (respectively, decrease) in mean BA has a <0.01 probability of being incorrect [i.e. of actually being a decrease (respectively, increase) in mean BA].
As well as the BA, we can gain insight into discrepancies between each model and the data by comparing the sizes of each OD.
Since an OD is a probability density function, the variance of the OD provides a standard way to measure its size (Wasserman, 2004). We denote by V i,E the variance of the empirical OD from 5th May to 3rd June (inclusive) for pig i , by V i,I the corresponding variance for the interaction model, and by V i,N the corresponding variance for the null model. To compare magnitudes, we take the ratio of the variances, In our examples, it turns out that these ratios can vary by multiple orders of magnitude. Therefore, we take the logarithm of suggests that the model is over-estimating the spatial scale over which pig i roams, and therefore, one might consider incorporating into future models a movement process that is likely to lead to more confined movement. Conversely, a negative value of Q i,M suggests that the existing confinement processes in the model are too strong; therefore, they needs to be counter-balanced by an parameter that causes wider exploration of space by individuals. It is also valuable to examine the mean of Q i,M over all i, which we denote by Q M .
Repeating the process. In Sections 2.3 and 2.4, we will give details for our specific example of how we constructed new models that we hypothesised might improve on the interaction model in terms of space use prediction. The general procedure is to use the IBM techniques of Steps B and C to ascertain whether or not these new models are better at predicting space use patterns (using BA as a yard-stick). This then enables us to generate further hypotheses about what might still be missing from all of the models so far constructed, so going back through the whole process (steps A-D) once more (Platt, 1964). In principle, one can continue this iterative process until the models being produced are no longer giving better predictions. However, for the purposes of demonstrating our method, we only go through three iterations: first analysis of the initial models (the interaction model and null model), and then two stages of subsequent modelling, each stage aiming to construct models that improve predictive power over any previously constructed model.

| Assessing different interaction mechanisms
So far, our interaction model assumes that an animal responds to an OD by either moving up or down the ODs gradient. This can be problematic in the case of self-attraction, since it can cause a positive feedback mechanism whereby simulated individuals immediately move back to the location they have just visited, and therefore end up pinned in an arbitrarily small area.
To mitigate this, we propose three alternative models. In the first, we simply remove self-attraction; that is, we set i,i = 0 and called this the no self-attraction model. In the second, we replace self-attraction with attraction to a central place, x i,cp (for i ∈ {1, … ,12}), defined as the centre of mass of the OD of the initial condition for each pig (i.e. the OD from 5th April 2016 to 4th May 2016 inclusive). This requires incorporating an additional variable, into the step selection function (Equation 1). We call this model the central-place attraction model. Note that this is qualitatively similar to OrnsteinUhlenbeck models (Wang et al., 2019), as the strength of attraction increases with distance from the central point, but it is not mathematically identical.
The third model incorporates a quadratic self-attraction term alongside all the parameters from the interaction model, denoted by (full functional forms of all models are given in Supplementary Appendix B). The rationale behind this is that if the quadratic terms turned out to be negative, this would act as a counter-balance to the pinning effect by making extremely highly used lattice sites repellent rather than attractive. Biologically, this would mean that pigs have a tendency to be attracted to familiar areas up to a point, but will avoid areas that have been overused in the recent past. We call this model the overuse avoidance model.

| Incorporating environmental features
As well as social interactions and self-attraction, animal space use is shaped by its environment. The presence and juxtaposition of food, cover and water ultimately determine the suitability and occupancy of feral pigs on a landscape, which leads to a complex web of environmental features that may ultimately determine pig movement decisions. However, for the purposes of giving a simple demonstration of how to incorporate environmental features into our framework, we focus here on just one feature. Since pigs cannot sweat, they rely on shade and water to thermoregulate, both of which are associated to forested habitat, which pigs are known to favour (Paolini et al., 2018). Therefore, we focus here simply on the presence or absence of forest cover as a means of demonstrating how to incorporate environmental features into our model. We also provide a further demonstration of incorporating environmental effects (specifically the presence of corn crops) in Supplementary Appendix C.
To account for forest cover, for each pig i = 1, … , N, we define Z F (z, x, t) = 1 if location z (at the end of the step) is in forest and Z F (z) = 0 if not (using the notation from Equation 1). Since Z F (z, x, t) is defined to be independent of both the start of the step, x, and time, t , we will write Z F (z) = Z F (z, x, t) to ease notation. Figure 4a shows the distribution of forest in our study area and Figure 4b superimposes on this the empirical OD for the period from 5th May to 3rd June 2016. We see that there is an area of forest around 20 km east and 20 km north in which all pigs (except Pig 7; in red) spent most of their time during this period, so it is valuable to keep this in mind when assessing the predictive capability of models with forests.
We construct two models that incorporate forest data.

| RE SULTS
Using SSA to parametrise the interaction model resulted in 144 values of ̂ i,j , one for each pair i, j ∈ {1, … ,12}. Of these, 43 were significantly different from 0 (p < 0.01; Table 1). From these 43, two were negative and the other 41 were positive. Recall that ̂ i,j represents the tendency for pig i to be attracted to (respectively, avoid) pig j if � i,j > 0 (respectively, � i,j < 0). Therefore, 41 of these 43 interactions represented attractive tendencies to pigs' recent ODs. The two that did not were pigs 10 and 11, who had a slight tendency to avoid the ODs of pig 3 ( Table 1).
Simulations of the interaction model revealed that this model often predicted ODs that were far smaller than those seen in the data (compare Figure 2b with Figure Table 2; results for each individual are in Supplementary Table ST1). However, this still does not represent a very good agreement between model and data, likely due to the very small ODs often predicted by the Interaction Model ( Figure 2e,f).
The no self-attraction model was marginally better than the interaction model at predicting space use (Table 2), with an average BA of 0.188 between the model and the data as compared with 0.170.
Regarding the predicted OD sizes, we found that the Qvalue for the no self-attraction model was Q NSA ≈ − 1.5. This is an improvement on the interaction model (Q I ≈ − 3.1) but still suggests that the predicted ODs tend to be 1.5 orders of magnitude (about 30 times) smaller in area than the measured ODs. That said, visual output reveals that some predicted ODs are quite a bit bigger than the empirical ODs (Figure 2a,b), and for some pigs, Q i,NSA is positive (e.g. On fitting the central-place attraction model, we found that 8 of the 12 pigs showed a significant attractive tendency towards the central place (Table 3). Simulations revealed that this model tended to be better at predicting space use than the previous three, with an average BA of 0.235 between the model and the data. However, for this model, Q CPA ≈ − 1.6, marginally worse than the no selfattraction model but better than the interaction model. This decrease in OD size can also be seen visually: compare Figure 3c The quadratic self-attraction term, Z i,Q (x, z, t), turned out to be highly significant for all pigs (Table 3). Furthermore, the resulting overuse avoidance model was better at predicting space use than the previous four, having an average BA of 0.297 between the model and the data. The predicted OD sizes are also closer to the empirical ODs than the other models, with Q OA ≈ − 0.63. Nonetheless, this still means that predicted ODs tend to underestimate the OD area by a factor of about four.
The forest model was very marginally better at predicting space use than the overuse avoidance model, with an average BA between model and data of 0.303 compared to 0.297 ( Table 2). Whilst we did sufficient simulations to ensure that this increase is significant (p < 0.01) the effect size is clearly quite small. This suggests that the effect of forests on space use is minimal compared to the effect of social interactions for our study population. Indeed, for the just forest model, the BA between model and data is only 0.162 (Table 2), which implies worse predictive power than all of the models except the null model (which, recall, is just a random walk). This suggests that social interactions are more important factor in shaping space use than the presence of forest for our study population (though an alternative explanation may be there is another environmental feature we have not tested here).
Indeed, for the forest model, only four of the pigs showed a significant effect of forest presence on movement, when controlling for social interactions ( Table 3). All of these four show a positive (i.e. attractive) effect. When we neglect to control for social interactions (the Just Forest Model), however, we find that eight of the pigs show a significant preference for forest, and also that the coefficients for attraction to forest are larger. This suggests that apparent preference for forest may sometimes be an artefact of social interactions with pigs who prefer forest themselves, or vice versa.
Regarding predicted OD size, the Q − value for the forest model  Just forest Attraction to forest but no social/self interactions 0.162 TA B L E 1 Best fit parameter values, ̂ i,j , for the interaction model, where i, j ∈ {1, … ,12}. Columns (respectively, rows) correspond to the j-values (respectively, i -values). The labels P1-P12 refer to pigs 1-12, respectively. The numbers in bold are significantly different from 0 (p < 0.01), whereas other numbers are not significant. This is a feature that cannot be seen in any of the previous models.
However, the just forest model displays quite poor predictions of space use patterns, confirming the inflated OD sizes measured by Q JF .

| DISCUSS ION
We have described a methodological framework for assessing the power of parametrised step selection models for predicting space use patterns. We have shown how this framework can be used to uncover missing mechanisms in step selection models. It is possible to incorporate into these models both social interactions between animals and the effects of environmental features on movement.
Our work provides a generic framework for iterative construction of predictive, mechanistic models of animal space use.
To demonstrate our framework empirically, we used data on concurrent movements of feral pigs in Mississippi, USA. Our approach was to parametrise our step selection model from the whole dataset then ascertain the extent to which the model could reproduce the broad-scale space use patterns in the same dataset. To do this, we set the initial conditions of our model to be the empirical ODs at one point in time, then ascertained the extent to which the model predicted the empirical ODs at future points in time. This procedure of parametrising and testing our model using the same data (but at different spatio-temporal scales) means that any discrepancy between prediction and observation suggests a missing feature in the model. Whilst the main purpose of our application to pig data was as a tool for introducing the method, rather than studying pig biology per se, our results nevertheless suggest some interesting features of pig movement, at least in relation to our particular study population.
First, having parametrised a model of social interactions, we found that there was a significant signal of self-attraction for each pig (i.e. attraction to its own OD, a.k.a. philopatry). Yet, when we simulated this mechanism, the predicted future ODs were typically several orders of magnitude smaller than the empirical ODs. This shows that there must be some counter-balancing mechanism that is not in the model that allows for more realistically sized ODs to emerge. We found that incorporating a quadratic self-repulsion term, so that pigs avoid over-used areas, was both statistically significant for all pigs and caused slightly larger ODs to emerge. This is in keeping with the marginal value theorem (Charnov, 1976), whereby it is optimal for animals to leave areas that have been over-used. However, the predicted ODs still tend to be smaller than the empirical ODs, suggesting that it may be beneficial to search for other mechanisms that allow the simulated ODs to grow to a size more fitting with the empirical ODs (one hypothesis might be to incorporate quadratic terms into the between-pig interactions).
A second interesting outcome from the application to our study population is that, whilst attraction to forest is statistically significant for most of the pigs, this attraction both becomes less apparent when controlling for social interactions and also has very little effect on the predictive power of the resulting models. This gives evidence to suggest that social interactions and self-attraction are driving space use to a much greater extent than attraction to forest in our  (Calabrese et al., 2018).
Indeed, the possibility that untested covariates are driving space use is always an issue in resource and step selection analyses, as there will inevitably be drivers of movement on which researchers do not have data (Fieberg et al., 2018). The methods here help identify such missing drivers of space use, which in turn may improve both inference and predictive power. To this end, we are currently using our methodological advancements to build a more detailed predictive model of the pig system, which will be the subject of future work.
Providing techniques for building models requires a measure of goodness-of-fit between model and data. However, in studies of movement and habitat selection, the goodness-of-fit question is rarely addressed (Potts, Auger-Méthé, et al., 2014), partly due to a paucity of techniques (Fieberg et al., 2018). The latter study proposed use-calibration plots to address this question in the context of resource selection. This examines the distribution of explanatory variables; that is, the environmental variables used to explain lo-

cations of animals. Whilst this is a valuable technique in situations
where there is a key demarcation between explanatory and response variables, in the situation of social interactions, there are feedbacks F I G U R E 3 Example simulations of occurrence distributions. Panels (a and b) were constructed using the No Self-Attraction Model; panels (c and d) used the Central-Place Attraction Model; panels (e and f) display output from the Overuse Avoidance Model. As in Figure 3, the contours were drawn at a height of 0.001 km −2 .
between the ODs of animals, meaning one cannot say that one OD is always the explanatory variable and the other the response variable.
This is a key difference between statistical and mechanistic modelling. Thus, we require different techniques for assessing goodnessof-fit. Here, we have proposed two: comparing OD overlaps using BA, and OD size via Q. Such a comparison is similar in flavour to pattern oriented modelling, where one tunes an IBM so that specific summary statistics match the data (Grimm & Railsback, 2012) [which in turn has similarities to approximate Bayesian computation (Sunnåker et al., 2013)]. Here, however, we parametrise our models directly from movement data, rather than by parametrising models by comparing patterns. Instead, we use the patterns (BA and OD size) to assess both goodness-of-fit and predictive power.
A lot of the technical advancements in our approach arose through dealing with social interactions, particularly when finding a way to move from SSF to IBM. If one is purely interested in interactions with more-or-less static environmental covariates, there are existing techniques that may be employed instead (Fieberg et al., 2021;Signer et al., 2017), and these can be combined with assessment of BA and Q for assessing predictive power. However, whilst environmental predictors occupy the bulk of SSA studies (as they are simpler to characterise), an increasing number are uncovering the effect of social interactions (Latombe et al., 2014;Schlägel et al., 2019;Swanson et al., 2016;Vanak et al., 2013). Therefore, it is an important research frontier to understand how such interactions affect space use patterns. Our methods should help researchers progress this frontier.
A key challenge when modelling social interactions is that often one does not have every interacting individual tagged contemporaneously. Such questions have been addressed in contexts away from SSA (Calabrese et al., 2018;Niu et al., 2016) and these methods may provide valuable fodder for future advancements in this direction.
If one has several animals with roughly similar home ranges then it may not matter greatly if a small subset are missing from movement data (one could check this by subsampling the data, but the effort of redoing the IBM analysis for every subsample would not be trivial).
A more critical situation might be where animals are territorial, and a missing animal might mean that there is a gap in the terrain that appears (in the data) not to be occupied but in fact is occupied. Here, one possible way forward might be to take advantage of the fact that we have an IBM, by using this to simulate data for the missing individual (perhaps by using parameter values averaged across the other individuals). After all, an advantage of a mechanistic model is that you can simulate things for which you have no data. However, we expect that exactly how to account for missing individuals will vary depending on the study system, and cracking this problem will require significant future work.
In conclusion, we have proposed an iterative model development cycle for (a) building models that are as predictive as possible given a dataset and (b) using the concept of prediction to inform model improvement. Whilst our case study was kept deliberately simple, to elucidate the methodology, it provides a fundamental tool for determining the requisite building blocks of detailed, predictive models of animal space use. Service/Wildlife Services. We thank two anonymous reviewers and an associate editor for constructive comments that helped improve our manuscript.

CO N FLI C T O F I NTE R E S T
None of the authors have a conflict of interest to declare.

DATA AVA I L A B I L I T Y S TAT E M E N T
The Cropland Data Layer is publicly available (https://nassg eodata.gmu.edu/CropS cape/). The pig relocation data is available on Figshare (Potts et al., 2022).