Evidence Accumulation Account of Human Operators' Decisions in Intermittent Control During Inverted Pendulum Balancing

Human operators often employ intermittent, discontinuous control strategies in a variety of tasks. A typical intermittent controller monitors control error and generates corrective action when the deviation of the controlled system from the desired state becomes too large to ignore. Most contemporary models of human intermittent control employ simple, threshold-based trigger mechanism to model the process of control activation. However, recent experimental studies demonstrate that the control activation patterns produced by human operators do not support threshold-based models, and provide evidence for more complex activation mechanisms. In this paper, we investigate whether intermittent control activation in humans can be modeled as a decision-making process. We utilize an established drift-diffusion model, which treats decision making as an evidence accumulation process, and study it in simple numerical simulations. We demonstrate that this model robustly replicates the control activation patterns (distributions of control error at movement onset) produced by human operators in previously conducted experiments on virtual inverted pendulum balancing. Our results provide support to the hypothesis that intermittent control activation in human operators can be treated as an evidence accumulation process.


I. INTRODUCTION
In a variety of situations, ranging from postural balance to car driving to aircraft landing, human operators engage in motor behavior aimed at maintaining a system under control near a target state or trajectory. There is increasing evidence that in such tasks human operators use complex, non-linear control strategies, which have been subject to intensive modeling efforts (see [1]- [3] for review). A common thread running through much of the recent empirical and theoretical research is that human control is intermittent: an operator observes the state of the controlled system continuously, but applies corrective effort only intermittently. Opposing the traditional notion of continuous optimal control, the studies on intermittent control emphasize discrete, ballistic corrective movements separated by periods of inactivity (Fig. 1). Intermittent control is observed in a variety of laboratory and real-life tasks, including visuomotor tracking [4], inverted pendulum balancing [5], [6], postural balance [7], [8], and car driving [9], [10]. Such intermittent control strategies can be more efficient than continuous feedback in the presence of sensorimotor delays and neural noise [11]. However, despite much interest from both applied and basic research communities, the theory of human intermittent control is still in its initial stages of development. The combination of key mechanisms in play during intermittent control is generally agreed upon, and includes delays, noise, prediction, open-loop control adjustments, and event-driven control activation [1], [2], [12]. However, the details of these mechanisms remain obscure and adequate computational models are often missing. The overarching aim of this paper is to shed light on one of these key mechanisms: control activation.
The majority of human intermittent control models rely on the assumption that corrective movements are triggered when the control error exceeds a fixed threshold(e.g. [1], [13]). This is normally attributed to sensory deadzones, i.e., the lack of operator's awareness of the small control errors. Empirical observations, however, indicate that human operators often ignore deviations which significantly exceed the perception threshold [9], [14]. Indeed, due to the difficulty of handling small deviations and metabolic costs of high-frequency control, human operators can choose to ignore acceptable deviations as long as this does not threaten the overall goal of the control process [6], [15]. In this case, transitions from passive to active control phase can no longer be reduced to threshold-driven triggering, and therefore require more advanced models.
The problem of control activation modeling has been treated previously using two related but distinct approaches. One class of models, noise-driven activation, provided phenomenological description of the activation process using, e.g., the notion of double-well attractor borrowed from physics [14], [16]. These simple models closely reproduce human operators' behavior observed experimentally in virtual inverted pendulum balancing. These models were validated specifically against the key characteristic of control activation, the distribution of action points, i.e., the magnitudes of control error triggering the corrective movements. However, noise-driven activation models only emphasize the intrinsic stochasticity of the process, but do not provide any insights into psychophysiological mechanisms of control activation, which suggests the need for  [14]. The action points are the values of control error (in this case, inverted pendulum tilt angle) triggering the onset of corrective movement (marked by orange circles). Figure file is available under CC-BY [19]. more cognitively plausible models.
Another set of models approaches control activation as a decision-making process [10], [12]. These models implicate stochastic evidence accumulation mechanisms in control activation. The evidence accumulation mechanisms are supported by much neuroscientific work on decision-making [17], [18], thereby providing a biologically grounded interpretation of the control onset emergence. Moreover, a general intermittent model involving as a part an evidence accumulation mechanism was shown to reproduce amplitudes and timing of steering adjustments exhibited by human drivers in a lanekeeping task [12]. However, this general model encapsulates many assumptions regarding other control mechanisms (e.g., prediction, movement generation, delays), which precludes one from disentangling the effects of these mechanisms on control activation. Furthermore, unlike the noise-driven activation model, the evidence accumulation model has not yet been confronted to human action points data, which limits its validity as a general control activation framework.
In this paper we aim to bridge gap between the two above approaches. We propose a decision-making model of control activation based on noisy bounded evidence accumulation and demonstrate that it reproduces in details the activation patterns observed in human operators in a virtual balancing task (Fig. 2). Our results provide support to the hypothesis that intermittent control activation in human operators can be treated as an evidence accumulation process.

Background
The currently predominant view on human decision making is that it operates as an evidence accumulation process, which is usually described by different variants of drift-diffusion Fig. 2: Inverted pendulum on a cart. This simple mechanical system was previously used to investigate control activation patterns in virtual balancing task [14].  model (DDM) [21], [22]. This model posits that, in a simple stimulus detection task, a decision maker continuously samples and accumulates the evidence in favor of the stimulus being present. The rate of accumulation depends on the stimulus intensity. In addition, integration of evidence over time is necessarily affected by noise in neural firing rates. When the amount of accumulated evidence exceeds a pre-defined boundary, the decision is made that the stimulus is present (Fig. 3).
It has been shown that DDM and its numerous flavors can remarkably well capture the error rates and reaction time distributions exhibited by human subjects in a variety of perceptual tasks (see [18] for a recent review). Moreover, neural recordings of brain activity in non-human primates suggest that similar accumulation-to-boundary mechanisms are implemented in lateral intraparietal cortex associated with simple perceptual decisions [17], [24], [25]. Similar accumulative processes have also been directly traced in human neuroimaging studies [26], [27]. However, despite its simplicity and cognitive plausibility, the applications of this model to human control are still scarce, which warrants more detailed investigations.

Model
Mathematically, DDM is described by a stochastic differential equation where x(t) is the evidence accumulated at time t, A is the drift rate parameter associated with the stimulus strength, c is the diffusion rate, or noise intensity, and ξ is white Gaussian noise. Finally, the boundary parameter b defines the critical amount of evidence at which the decision is made.
In the context of intermittent control, we assume that during the passive control phase, a human operator continuously observes the control error and accumulates evidence in favor of activating the control according to Eq. (1). This suggests that the drift rate A may be dependent on control error, making the control error analogous to the "stimulus strength" in conventional DDM paradigms [21], [26]. In our model, we simply assume where θ is control error. When the accumulated evidence reaches the decision boundary, the corrective action is launched. We exemplify the model using a simple balancing task, in which a human operator maintains an overdamped inverted pendulum upright by adjusting the position of the moving cart connected with the pendulum via a pivot (Fig. 2). Control activation patterns in non-expert human operators were investigated previously using a virtual version of this task, where the operators controlled the cart velocity by moving a computer mouse [14], [28]. In this case, control error θ is represented simply by pendulum tilt angle. The dynamics of the task is described by the equation where θ is the pendulum tilt angle, and the control variable υ is the velocity of the moving cart controlled directly by human operator (here the linear coefficients specifying the temporal scale of the system dynamics are all assumed to be equal to 1 without any loss of generality). In the passive control phase (υ = 0), θ increases exponentially until the pendulum falls (for any non-zero initial disturbance). Therefore, in the model (1), where t is the time passed since the start of each passive phase, and θ 0 is the value of control error at the start of that passive phase.
The previously reported analysis of human performance in the above task revealed that for every subject the values of action points (pendulum tilt angles at which control was activated, see Fig. 1) followed exponential distribution with a peak at the angle close to zero [14]. This implies that the operators often ignored much larger control errors than suggested by a threshold-based activation model. In what follows we investigate whether the DDM-like model can reproduce this exponential distribution of action points.

III. MODEL SIMULATION
In the simplest case A ≡ const, the model (1) admits analytical solution [22]. However, for time-dependent A derivation of the closed-form solution is non-trivial and deserves individual consideration. For this reason, here we study the properties of the model only via numerical simulation.
As mentioned above, our main focus here is the control activation mechanism; modeling corrective adjustments in the active control phase is outside the scope of this paper. Therefore, for the present purposes it is sufficient to perform openloop simulations of the model (1) in a sequence of independent control activation trials, rather than embedding the model in a closed-loop simulation of the whole balancing process. In the beginning of each trial, we reset the control error θ to a baseline value, and then simulate evidence accumulation over time according to Eqs. Note that, after controlling for the scale of pendulum fluctuations which varied with subjects' balancing skill, the action point distribution was universal for all participants [14]. For this reason, in the model we aimed to reproduce the z-scored action points, i.e., measured not in radian but in standard deviations of the pendulum angle θ.
First, we analyzed the model's behavior for the setup when in the beginning of each trial, the pendulum tilt angle θ was set to a fixed baseline value, θ 0 = 0.5. The results for b, c ∈ {0.2, 0.4, 0.6, 0.8} are represented in Fig. 4. In general, the model failed to explain the action point distribution observed in human operators. The closest fit was observed around b = 0.8, c = 0.8 (fourth row, fourth column in Fig. 4). Still, the best-fit model distribution differs considerably from the experimental one. Local optimization using a quasi-Newton method did not substantially improve the fit we obtained by grid search.
The failure of the model to capture human control activation pattern suggested that either 1) the DDM is not an adequate model of control activation, or 2) some factors other than evidence accumulation affect the action point distribution exhibited by human operators. To clarify this, we adopted Open-loop numerical simulations were run, with the initial control error fixed at θ = 0.5 in each trial. Black lines indicate the pdf produced by human subjects in the previously conducted experiment on virtual inverted pendulum balancing [14]. Here action points are dimensionless, i.e., measured not in radian but in standard deviations of θ, following Ref. [14]. Values of the diffusion rate c and the decision boundary parameters are indicated in each panel. An approximate measure of fitness is ∆, mean difference between the log-scaled experimental and model pdf's over the range of action point values. We did not calculate the fitness (∆ = N/A) in case the range of the model action point values covered less than 50% of the experimentally obtained range. Figure file is available under CC-BY [23]. an additional assumption: the action point distribution is also affected by initial control error, i.e., the starting position of the inverted pendulum in the beginning of passive phase. In the second numerical simulation, the initial value of the pendulum tilt angle at each trial was not fixed, but instead randomly drawn from the distribution of actual starting positions observed previously in human operators 1 . The results indicate that with this additional assumption, the DDM model explains the human control activation pattern observed experimentally (Fig. 5).
In the present paper we did not aim to find the bestfitting parameter values; more important was to demonstrate the overall plausibility of the model over a range of possible 1 The experimentally obtained distribution of pendulum tilt angle in the beginning of each passive phase was also exponential, but was narrower than the action point distribution parameter values. Nevertheless, it is worth noting that the model reproduced the experimental distribution best at the intermediate values of the boundary parameter b and the diffusion rate parameter c being slightly greater than b. The best fit was achieved at c = 0.8, b = 0.6 (fourth row, third column in Fig. 5) 2 . The fact that the model matches the experimental data robustly for a range of possible parameter values suggests that human control activation can be described as a decision-making process based on the evidence accumulation mechanism (at least in the considered task).

IV. DISCUSSION
In this study we aimed to investigate whether control activation in human operators can be treated as a decision-making process. The model based on the established notion of evidence accumulation could reproduce the pattern of control activation observed in human operators, in particular, the distribution of control error triggering corrective adjustments (action points). These preliminary results may have implications for models of human performance in a wide variety of processes where human control is intermittent.
Our study is not the first one to use a drift-diffusion model in an intermittent control model. The predecessor studies included evidence accumulation as one of the building blocks in a general framework of sustained sensorimotor control, exemplified by a model of lane keeping while driving a car [10], [12]. However, these studies did not assess specifically control activation patterns, focusing instead on reproducing the characteristics of corrective adjustments (including, among others, the joint distributions of their amplitudes and interadjustment times). Our work addresses this issue and thereby reinforces the notion of evidence accumulation as an integral part of future intermittent control models.
On the other hand, a class of noise-driven activation models has been previously shown to accurately reproduce in detail the control activation patterns in inverted pendulum balancing [14], [16]. This suggests that, under some conditions, double-well attractor model and drift-diffusion model might be equivalent on the computational level. This has potential implications for wider areas of decision-making research: attractor models are increasingly popular in cognitive science [29]- [31], but their connection to the evidence accumulation models remains unexplored. This is an important avenue for future studies.
In this paper, we only make on of the first steps in investigating the complex relationship between evidence accumulation and control activation. An important limitation of the present work is that the human-operated overdamped inverted pendulum is a relatively simple control system. This has allowed us to directly measure action points both in the model and human data, but may potentially limit direct applicability of our results to more sophisticated control systems, where the effects of neural delays, prediction, and motor noise are more essential. Designing a DDM-based control activation model for a more complex human-controlled process will necessarily require meticulous search for a perceptual quantity treated as as a control error by the operators. Even in simple inverted pendulum balancing, changing the perceptual cue available to the operator can drastically change the distribution of action points [28]. This can be captured on a phenomenological level by an abstract attractor model [32]; further studies should investigate if and how the decision-making account of control activation can mechanistically explain such changes.
Another potentially fruitful direction for future work is the extension of the behavioral study of human control with concurrently recorded neuroimaging data, since the evidence accumulation account can be leveraged to make direct predictions about brain activation as well [26], [27].
Despite their simplicity and cognitive plausibility, evidence accumulation models are rarely used in applied studies on human performance. We believe that empirical validation of these models in processes controlled by human operators will be beneficial to the wider field of human-machine interaction.