General-Purpose Methods for Simulating Survival Data for Expected Value of Sample Information Calculations

Background Expected value of sample information (EVSI) quantifies the expected value to a decision maker of reducing uncertainty by collecting additional data. EVSI calculations require simulating plausible data sets, typically achieved by evaluating quantile functions at random uniform numbers using standard inverse transform sampling (ITS). This is straightforward when closed-form expressions for the quantile function are available, such as for standard parametric survival models, but these are often unavailable when assuming treatment effect waning and for flexible survival models. In these circumstances, the standard ITS method could be implemented by numerically evaluating the quantile functions at each iteration in a probabilistic analysis, but this greatly increases the computational burden. Thus, our study aims to develop general-purpose methods that standardize and reduce the computational burden of the EVSI data-simulation step for survival data. Methods We developed a discrete sampling method and an interpolated ITS method for simulating survival data from a probabilistic sample of survival probabilities over discrete time units. We compared the general-purpose and standard ITS methods using an illustrative partitioned survival model with and without adjustment for treatment effect waning. Results The discrete sampling and interpolated ITS methods agree closely with the standard ITS method, with the added benefit of a greatly reduced computational cost in the scenario with adjustment for treatment effect waning. Conclusions We present general-purpose methods for simulating survival data from a probabilistic sample of survival probabilities that greatly reduce the computational burden of the EVSI data-simulation step when we assume treatment effect waning or use flexible survival models. The implementation of our data-simulation methods is identical across all possible survival models and can easily be automated from standard probabilistic decision analyses. Highlights Expected value of sample information (EVSI) quantifies the expected value to a decision maker of reducing uncertainty through a given data collection exercise, such as a randomized clinical trial. In this article, we address the problem of computing EVSI when we assume treatment effect waning or use flexible survival models, by developing general-purpose methods that standardize and reduce the computational burden of the EVSI data-generation step for survival data. We developed 2 methods for simulating survival data from a probabilistic sample of survival probabilities over discrete time units, a discrete sampling method and an interpolated inverse transform sampling method, which can be combined with a recently proposed nonparametric EVSI method to accurately estimate EVSI for collecting survival data. Our general-purpose data-simulation methods greatly reduce the computational burden of the EVSI data-simulation step when we assume treatment effect waning or use flexible survival models. The implementation of our data-simulation methods is identical across all possible survival models and can therefore easily be automated from standard probabilistic decision analyses.


Appendix A -Methods for Simulating Survival Times
In this appendix we will illustrate how to simulate survival times in R using the 3 methods discussed in the paper: standard inverse transform sampling (standard ITS), interpolated inverse transform sampling (interpolated ITS) and discrete sampling.
We will first install (if not installed yet) and load the flexsurv, tidyverse and knitr packages, which we will use in the example.

Probabilistic Analysis
We start by running a probabilistic analysis in which we sample k = 1, . . . , K values θ (k) from the prior distribution of the model parameters, p(θ), which in this example is the rate parameter of an exponential model that describes overall survival for a single treatment arm.
set.seed(1) # set the seed for reproducibility K <-1000 # number of simulations theta <-rgamma(K, 35, 950) # prior distribution for the exponential rate We then construct K vectors of survival probabilities by evaluating the exponential survivor function given the sampled values θ (k) at discrete time cycles.

Simulating Survival Data for a New Study
We can simulate survival times for a new study using the standard inverse transform sampling method by evaluating the quantile function at random uniform samples between 0 and 1. We can implement this using the rexp function in base R, which requires 2 arguments: the number of observations (e.g. number of patients to be included in the new study) and a value for the exponential rate parameter.
n <-100 # number of patients/observations # simulate n survival times given theta times_std1 <-sapply(theta, function (x) rexp(n, x) ) The output is a matrix of simulated survival times, where rows index the n simulated survival times, and columns index the K simulations. The first 10 simulated survival times for the first 5 simulations are given by:

Simulating Survival Data for an Ongoing Study
The rexp function simulates survival times that can take any value between 0 and infinity. If we want to simulate survival times for an ongoing study, we need to sample survival times that are greater than the observed follow-up time t obs . We can do this by sampling from a conditional quantile function that is left-truncated at t obs . We can implement this in R using the following code (we will record the computation time using the Sys.time() function):

Numerical Methods
When closed-form expressions for the quantile function, such as the qexp function, are unavailable, we could numerically evaluate the integrals and function inverses to produce a quantile function.
In the following code, we define a new exponential survivor function f_surv by first defining the hazard function, and then integrating the the hazard to produce the cumulative hazard. We can then compute the survival probabilities by exponentiating the negative cumulative hazards. The f_surv(q, theta) function numerically approximates the analytic exponential survivor function pexp(q, rate=theta, lower.tail=FALSE) in base R.

Interpolated Inverse Transform Sampling
We can also use inverse transform sampling to simulate survival times from the vectors of survival probabilities over discrete time units that we generated in the probabilistic analysis. Similar to the standard inverse transform sampling method, we first sample random uniform numbers. We then interpolate the survival probabilities using monotone cubic splines at the sampled numbers and record the interpolated cycle times

Discrete Sampling
An alternative approach for simulating survival times from the vectors of survival probabilities over discrete time units is by sampling discrete 'bins' of cycle times with probability equal to the cumulative density of each bin. We can estimate the cumulative density of each bin by computing the successive differences between the survival probabilities at each discrete cycle time, which we then use as probabilities to sample the half-cycle times.

Kernel Density Plot
The kernel density plot in Figure A1 shows that the distribution of the simulated survival times is very similar across the different methods.

Computation Time
The computation time for the different data-simulation procedures is given in Table A1. The standard inverse transform sampling scheme that uses numerical methods for evaluating the quantile function is several orders of magnitude slower than the other methods.

Appendix B -Synthetic case study dataset
We generated a synthetic dataset using evenly spaced OS and PFS times for each d given 4 Weibull distributions, using the 0.005 th , 0.015 th , ..., 0.985 th , 0.995 th quantiles from each distribution. This avoids generating implausible datasets where PFS exceeds OS or OS for standard care exceeds OS for the new treatment, which could occur by randomly sampling survival times. We enrolled all patients at time zero and right censored the survival times at 24 months. We did not apply any other type of censoring. The parameters of the Weibull distributions that we used to generate the synthetic case study data set are shown in Table B1, and the Kaplan-Meier plot of the synthetic case study data set is given in Figure B1.

Appendix C -Net benefit functions for the synthetic case study
The net benefit function for the new treatment is and the net benefit for standard care is where S(t c , θ d ) is the Weibull survivor function evaluated at cycle time t c ,