Optimal rate of convergence for approximations of SPDEs with non-regular drift

A fully discrete finite difference scheme for stochastic reaction-diffusion equations driven by a $1+1$-dimensional white noise is studied. The optimal strong rate of convergence is proved without posing any regularity assumption on the non-linear reaction term. The proof relies on stochastic sewing techniques.


Introduction
Consider the stochastic partial differential equation (SPDE) Here the unknown u is a random space-time stochastic process in 1 + 1 dimensions, ξ is a spacetime white noise, and b : R → R is a given function.The spatial domain is the 1-dimensional torus T = R/Z, in other words, we consider the equation with periodic boundary conditions.

I
Owing to the regularising property of the noise, equation (1.1) is well-posed even with merely bounded and measurable b, as classical results of Gyöngy and Pardoux [GP93a,GP93b] show.
For a far-reaching generalisation of these results we refer to the recent work [ABLM20].
The error analysis of stochastic reaction-diffusion equations of the form (1.1) with various regularity assumptions on the drift b goes back to the early days of numerical analysis of SPDEs.In what was the first study of a fully discrete numerical scheme for SPDEs, Gyöngy [Gyö99] showed that the space-time finite difference approximation of the above equation (A) strongly converges to the true solution if b is a bounded measurable function (B) converges with strong rate 1/4 w.r.t.time and 1/2 w.r.t.space if b is a Lipschitz continuous function.This rate was in fact shown to be sharp by Davie and Gaines [DG01], who proved matching lower bounds even in the linear case b ≡ 0. Despite a rapidly growing literature on the numerics of SPDEs in the two decades since, the "gap" between (A) and (B) has remained and no rate of convergence has been known even if b is just shy of Lipschitz: say, b ∈ C α with α < 1.
The aim of this paper is to resolve this question and derive the optimal rate of convergence (up to loss of arbitrarily small ε) without any regularity assumption on b.The main result can be informally summarized as follows.For the precise statement we refer to Theorem 1.3.2.Theorem 1.0.1.For any ε ∈ (0, 1/2), bounded and measurable b, and any initial condition of class C 1/2−ε (T), the forward Euler finite difference approximation of (1.1) converges strongly with rate 1/4 − ε/2 w.r.t.time and 1/2 − ε w.r.t.space.
The strategy of the proof is quite different from previous works.In [Gyö99], the method for the bounded b case crucially relies on the Gyöngy-Krylov lemma [GK96, Lem.1.1] and therefore is inherently not quantitative.As for methods in the Lipschitz (or one-sided Lipschitz) b case (see below for some references), they build on the analysis of the corresponding deterministic problem.Such approach is out of question for b ∈ C α , α < 1, since without the noise the PDE is not even well-posed, in general.Instead, our strategy uses stochastic sewing, initiated in [Lê20] and further developed in the numerical analytic direction in [BDG21,DGL21].

Literature
As mentioned above, quantitative results have so far remained out of reach when b is even slightly irregular, i.e. not at least one-sided Lipschitz.Qualitative results, further to the works of Gyöngy [Gyö98,Gyö99], were obtained in the case of bounded measurable b in Pettersson and Signahl [PS05] (convergence in the nondegenerate multiplicative case) and in Anton, Cohen, and Quer-Sardanyons [ACQ20] (convergence for an exponential integrator scheme).Needless to say, in the case of regular coefficients the rate of convergence of various discretisations of SPDEs is extensively studied.Even just in the context of space-time white noise driven reaction-diffusion equations the literature is rich, see among others [BG19, BGJK22, Deb10, JK08, LQ19, Pri01, Sha99, Wan20].A wider overview can be found for example in the above mentioned work [ACQ20, Sec. 1] or in Da Prato-Zabczyk [DPZ92, Sec.14. 1.10].The interested reader is also referred to the monographs [JK11,Kru13].
In contrast to SPDEs, which can be seen as infinite dimensional SDEs, the question of rate of convergence for finite dimensional SDEs with irregular drift coefficient is far more well-studied.As a small sample, we mention some of the most recent works [BDG21, NS20, MY20, JM21, Tag20, Yar21, LL21].The developments of the last years are discussed in more detail in the [Gyö99] considers (1.1) with Dirichlet boundary conditions instead of periodic.
survey [Szö21].However, we mention that even in the finite dimensional case, the optimal strong convergence rate without any regularity assumptions has only been proved quite recently [DGL21].

Notation
For a metric space (X, d) we define the following spaces of R-valued functions.The space of bounded and Borel-measurable functions is denoted by B(X) and is equipped with the norm f B(X) = sup x |f (x)|.The space of continuous functions is denoted by C(X).For α ∈ (0, 1] we denote by C α (X) the space of bounded functions f that satisfy We equip C α (X) with the norm . By convention, we set C 0 (X) := B(X) (and not C(X)!).
We fix a probability space (Ω, F, P).The white noise ξ on [0, ∞) × T is a mapping from B b ([0, ∞) × T), the bounded Borel sets of [0, ∞) × T, to L 2 (Ω) such that for any collection A 1 , . . ., A k of elements of B b ([0, ∞) × T), the vector (ξ(A 1 ), . . ., ξ(A k )) is Gaussian with mean 0 and covariance E(ξ(A i )ξ(A j )) = |A i ∩ A j |, where | • | denotes the Lebesgue measure.We also fix a filtration F = (F t ) t∈[0,1] such that for each t ≥ 0, A ∈ B b ([0, t] × T), and B ∈ B b ([t, ∞) × T), the random variable ξ(A) is F t -measurable, ξ(B) is independent of F t , and F t is P-complete.For example, we may (but don't necessarily have to) take F to be the completion of the filtration generated by ξ.The predictable σ-algebra on Ω × [0, 1] is denoted by P. The conditional expectation given F t is denoted by E t .The space-time stochastic integrals with respect to ξ are denoted by Most of the time the integrand f will be deterministic, in which case the stochastic integral can simply be defined as the continuous and linear extension of the mapping , which is in fact an isometry.More generally, we might consider P ⊗B(T)- Their stochastic integration can be found in e.g.[DPZ92].
We denote the convolution operator by This notation is used both when the domain of integration is R and T. Since in a typical situation f will be a heat kernel either on R or T, the context will make it clear which convolution we mean.
In proofs of theorems/lemmas/propositions we use the shorthand f g to mean that there exists a constant N such that f ≤ N g, and that N does not depend on any other parameters than the ones specified in the theorem/lemma/proposition.Moreover, when we explicitly write f ≤ N g, again the constant N depends only on the parameters stated in the corresponding theorem/lemma/proposition and might change from line to line.

Formulation
We consider the finite difference, forward Euler approximation of (1.1).To this end, we introduce the space and time grids, for each n ∈ N = {1, 2, 3, . ..} where c is a constant satisfying the condition c ∈ (0, 1/2), also commonly known as the Courant-Friedrichs-Lewy (CFL) condition in the present context.
Remark 1.3.1.The restriction to look at spatial grids with even number of points (i.e. the choice of 2n) is purely for convenience, otherwise the even and odd cases would require some notational distinction later on.The choice of focusing on the even case is motivated by the computational practice of using nested grids of mesh sizes 2 −k , k = 1, 2, . . ., N , up to some threshold N .
Note also that on Π n , just like on T, the addition is understood in a periodic way, i.e. (2n − 1)(2n) −1 + (2n) −1 is identified with 0. To ease notation, we also denote h = c(2n) −2 .Hence, by setting N 0 = N ∪ {0}, one has Λ n = hN 0 .Take an approximate initial condition ψ n : Ω × T → R. The approximation scheme is defined by setting u n 0 (x) = ψ n (x) for x ∈ Π n and then inductively for t ∈ Λ n and x ∈ Π n , where the discrete Laplacian is defined as and the discrete noise term is given by Recall that (1.1) admits a unique mild solution (see Definition 2.1.1 below) which will be denoted by u.The main result of the article reads as follows.
Theorem 1.3.2.Let p ≥ 2, ε ∈ (0, 1/4), and let b be bounded and measurable.Assume that the initial conditions ψ, ψ n are F 0 -measurable C 1/2−ε (T)-valued random variables, such that for a constant K < ∞ they satisfy ψ Lp(Ω; Then there exists a constant N depending only on the parameters c, p, ε, K, b B(R) such that for all n ∈ N the following bound holds: As will follow from the proof, with an appropriate extension of u n from the gridpoints Λ n × Π n to the whole of R + × T, the supremum on the left-hand side of (1.3) can be taken over (t, x) ∈ [0, 1] × T.
Remark 1.3.4.The freedom of allowing different initial condition for the approximation is not a particularly important feature of the statement, but it is convenient for the proof.Indeed, it allows to easily deduce the general case from the case of short times, that is, when the supremum runs only over t ≤ S, where S is a (small) constant depending only on the parameters of the problem.
Remark 1.3.5.There are several natural interesting directions to generalise the results of the present article: for example, equations driven by spatially coloured/multiplicative/Lévy noises, distributional drifts or even different approximation schemes.We leave these for future work.

Acknowledgments.
The authors thank the referees for numerous constructive suggestions.MG was funded in part by the Austrian Science Fund (FWF) Stand-Alone programme P 34992.For the purpose of open access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.We further thank the financial support of the International Office at TU Wien.

Estimates on heat kernels and stochastic convolutions
The evolution of the true and of the approximate solutions is very different even in the linear case b = 0.This is one of the main challenges compared to the finite dimensional case, where with vanishing drift the two processes are simply given by the noise process and in particular the error is 0 (see Section 3.1 for a bit more detailed comparison to the finite dimensional case).In infinite dimensions, the error of the linear problem propagates in a nontrivial way in the error analysis of the case of irregular b.The aim of this section is therefore to derive various estimates for the continuous and discrete heat kernels and the associated Ornstein-Uhlenbeck processes (i.e. the solutions of (1.1), (1.2) in the case ψ = 0, b = 0).

Definitions
We encounter three different heat kernels in the article: the continuum heat kernel on R, the continuum heat kernel on T, and the discrete heat kernel on T. The first two are defined by the last equality obtained by Poisson summation.The difference in scaling comes from the fact that the first is chosen to be the density function in x of a centered normal random variable with variance t, while the latter is chosen to be the Green's function of the heat operator on R × T. For sake of convenience we use separate notation for the action of the heat kernels via convolution: for f ∈ B(T), we denote P t f := p t * f .We define P R t analogously.The continuum heat kernels form a semigroup, that is, P t (P s f ) = P t+s f , and similarly for P R .The periodic heat kernel is used for the definition of the mild solution of (1.1).
Definition 2.1.1.A mild solution of (1.1) is a P ⊗ B(T)-measurable map u : Ω × [0, 1] × T → R which is continuous in (t, x), such that almost surely for all (t, x) ∈ [0, 1] × T the following equality holds: The setup of the discrete heat kernels is more complicated.The formulation below follows along the lines of [Gyö99], but for the convenience of the reader and due to various small differences we prefer to give the full details.From now on, the conjugate of a complex number z ∈ C is denoted by z.Consider the functions e j (x) = e i2πjx for j ∈ Z.They are eigenfunctions of ∆ with eigenvalues λ j = −4π 2 j 2 .It is well-known that (e j ) j∈Z forms an orthonormal basis of L 2 (T; C).In the next proposition we prove a discrete analogue.It will be convenient to use the piecewise linear extension of the restriction of e j to Π n : for −n ≤ j ≤ n − 1, for x ∈ Π n , and Proof.We start with (2.3).For j ∈ Z and x = k/2n, we have =8n 2 e j (x) cos 2πj 2n − 1 = −16n 2 e j (x) sin 2 πj 2n = λ n j e j (x).
Remark 2. 1.4.Although we do not discuss purely spatial discretisations, it is worth remarking that the above setup would already be enough to define the heat kernel for the spatially discretised operator ∂ t − ∆ n in its spectral representation, which would take the form It remains to encode the temporal discretisation in the discrete heat kernel.Naturally, on the temporal gridpoints t = kh the factor e tλ n j in (2.8) is simply replaced by (1 + hλ n j ) k .Between the gridpoints, we again interpolate linearly.More precisely, for j = −n, . . ., n − 1, for t ∈ Λ n , and (2.9) The following, which is based on the CFL condition, will be used frequently.
We can now define the discrete heat kernel and rewrite the approximation scheme (1.2) in a mild form.Denote by κ n (t) = ⌊th −1 ⌋h and ρ n (x) = ⌊x2n⌋(2n) −1 the leftmost gridpoint from t in Λ n and from x in Π n , respectively.We then set which is now a function of t ≥ 0, x, y ∈ T.
Remark 2. 1.6.Although each e n j is a C-valued function, p n t itself is R-valued for all t ≥ 0. Indeed, first consider x ∈ Π n , y ∈ T. Since λ n j = λ n −j and therefore µ n j (t) = µ n −j (t), one sees In addition, the restriction of e n −n to Π n takes only ±1 values, so e n −n (x)e n −n (ρ n (y)) ∈ R, which combined with the above shows that p n t (x, y) ∈ R for x ∈ Π n , y ∈ R. The same is true for all x, y ∈ T, since p n t (•, y) is given by linear interpolation between its values on Π n .Let us introduce the discrete convolution (2.12) Analogously to P, we then define the linear operators P n by setting P n t f := p n t * n f .Most of the time we understand P n t as an operator on B(T), but it can be seen as an operator on B(Π n ) as well.In the latter case, the identity holds .The inductive step (1.2) of the finite difference scheme can therefore be written as To conclude to a form similar to (2.1), it remains to show the following simple property.
Proposition 2.1.7.For s, t ∈ Λ n , x, y ∈ T, we have (2.15) Proof.By the definitions (2.11)-(2.12)and the orthogonality relation (2.4) we can write It follows that for t ∈ Λ n , x ∈ Π n , (1.2) can be equivalently written as p n κn(t−s) (x, y) ξ(dy, ds), Indeed, this clearly holds for t = 0 and for 0 < t ∈ Λ n it follows inductively from (2.14) and (2.15).Recalling that p n is defined for any space-time point, not just the ones on the grid, we then define an extension of u n to the whole of [0, 1] × T by setting p n κn(t−s) (x, y) ξ(dy, ds).
(2.16) Remark 2. 1.8.It is useful to note an alternative representation of P n as the transition kernel of a random walk indexed by Λ n .Let X 1 , X 2 , . . .be i.i.d.random variables with distribution One can observe that the condition c ≤ 1/2 is necessary in order for the above to be a probability distribution, while our stronger condition c < 1/2 guarantees that the random walk is "lazy".We then define S n = n i=1 X i and for t ∈ Λ n , set S n t = (2n) −1 S h −1 t .Then P n is the transition semigroup of S n : for any function f : Π n → R and any x ∈ Π n , where f is the 1-periodic extension of f from Π n to (2n) −1 Z.
Note however that P n 0 as an operator on B(T) does not equal the identity.

Discrete and continuous heat kernel bounds
We start with three classical heat kernel bounds.
(i) For all α, β ∈ [0, 1] with α ≤ β, there exists a constant N = N (α, β), such that for all f ∈ C α (D), 0 ≤ s ≤ t ≤ 1 and x, y ∈ D one has and (iii) There exists a constant N such that for all f ∈ B(D), 0 < s ≤ t ≤ 1, and x, y ∈ D, one has The estimate in (2.18) is very standard.A proof of it and its more general variants can be found for example in [BDG21, Appendix A].
For (2.19), notice that by the fundamental theorem of calculus we have Consequently, we get From (2.18), with β = 1, it follows that Applying (2.22) twice (the first with the choice α = 0), we get Finally, from the fundamental theorem of calculus, the identity ∂ t S = ∆S, (2.18) with β = 1, and (2.23) with α = 0, we get This finishes the proof.
Before moving on, we remark a simple bound that will be frequently used.
Proposition 2.2.2.For any λ > 0 and γ ≥ 0 there exists a N = N (λ, γ) such that for all t ∈ (0, 1] the following bound holds (2.24) Proof.By the change of variables x → t −1/2 x, we have As a toy example, one can see that with some absolute constant N > 0 one has for all s ∈ (0, 1] applying Proposition 2.2.2 with γ = 0 for the second inequality.The first inequality in (2.25) follows from bounding the sum from below by its restriction to |k| ≤ s −1/2 , so that each of the 1 + ⌊s −1/2 ⌋ terms are bounded from below by e −4π .
Remark 2.2.3.Another useful tool that will be repeatedly used is interpolation.It is known that there exists an absolute constant N such that for all n ∈ N and α ∈ (0, 1) one has 1.8] for a proof that carries through in our setting (equivalently, simply bound the infimum by the choice when f 1 is the convolution of f with the indicator of It is a well-known consequence of the above bound (see e.g.[Lun18, Theorem 1. 1.6]) that if a linear operator T mapping from Πn) also holds.Heat kernel estimates for the discrete heat kernels P n are less established.Since they are piecewise linear in time on each interval between neighbouring gridpoints, most estimates will be stated only for t ∈ Λ n .For the initial time we only need the straightforward property (2.26)For gridpoints after the initial time we recover almost the usual heat kernel bounds, at the cost of a log factor.
, the following bound holds (2.27) Proof.First assume α = 0 and β = 1.Note that in this case it suffices to consider neighbouring points x, z ∈ Π n (by virtue of the triangle inequality) and in fact only the case x = 0 and z = 1/2n (by virtue of translation invariance of the Hölder norm).For a function g : We then have Using (2.10) and then Proposition 2.2.2 (with λ = δ and γ = 1), we get (2.28) Let us K = 2 log(2n)/c and let us consider two cases, namely, 2 √ t K ≥ 1/4 and 2 √ t K < 1/4.In the first case, the claim follows directly by applying (2.28) to our given function f , since We now focus to the case 2 √ t K < 1/4.As a brief detour, take some K > 0 and recall the notations X i , S n , S n t from Remark 2. 1.8 and the representation (2.17).Denote furthermore by N n the number of nonzero elements in {X 1 , . . ., X n }.Then conditionally on N n = ℓ, (S n + ℓ)/2 has binomial distribution with parameters 1/2 and ℓ.Hoeffding's inequality implies that which gives Since K and n were arbitrary, this implies that Without loss of generality, we can assume that n ≥ 3, in which case we have that K > c −1/2 .For f : Π n → R, by using the representation (2.17), we have where for the third inequality we have used that t ≥ h and that K > c −1/2 , and for the last step we have used (2.29).The same bound holds for P( S n t ∈ A c K,t ), so that On the other hand, to estimate the first term on the right hand side of (2.30), we can use (2.28) Consequently, by (2.30), and the above, we get which upon recalling that K = 2 log(2n)/c gives As mentioned at the beginning of the proof, this yields (2.27) in the case α = 0, β = 1.The case β = 1, α = 1 follows from the trivial bound The case β = 1, α ∈ [0, 1] then follows by interpolation, see Remark 2.2.3.Finally, the case ) between (2.27) with β = 1 and the trivial bound This finishes the proof.
Lemma 2.2.5.Let α ∈ [0, 1].There exists a constant N (α) such that for all ψ ∈ C α (T), t ∈ [0, 1], we have Proof.One can easily see the estimate so we focus on proving that for all t ∈ [0, 1] and x, y ∈ T, we have Let us first prove the claim with α = 1.In addition, let us assume for now that t ∈ Λ n .There are three cases: Case 1: |x − y| < 1/2n and ρ n (x) = ρ n (y).In this case, by (2.2) it follows that where we have used the representation (2.17).
Case 2: |x − y| < 1/2n and ρ n (x) = ρ n (y).In this case, let us assume without loss of generality that ρ n (y) = ρ n (x) + 1/2n.By (2.2), we see that which implies that We also have Consequently, Case 3: |x − y| ≥ 1/2n.In this case, we have where we have used the results from the case |x − y| < 1/2n for the first and the third term, the representation (2.17) for the second term, and of course the fact that |x − y| > 1/2n.

E
Hence, we have proved the claim for t ∈ Λ n .If t ≥ 0, then the claim follows from the case t ∈ Λ n virtue of the identity see (2.9).
To summarise, we have shown the desired inequality with α = 1 for all t ≥ 0 and x, y ∈ T. From (2.32), it also follows that for all t ≥ 0 and x, y ∈ T, which is the claim for α = 0. Finally, the case α ∈ (0, 1) follows by interpolation, see Remark 2.2.3.
The following hold: 1.There exists a constant N such that for all

33)
2. There exists a constant N such that for all (2.34) Proof.We have For the second term, keeping in mind the definition of e n in (2.2), we see that where we have used Lemma 2.2.4 for the inequality.Similarly, by (2.9), we see that The proof of (2.34) is more straightforward and is left to the reader.
The following lemma, a key error estimate between the continuous and the discrete heat kernels, is similar to [Gyö99, Lemma 3.3], where an estimate of the form (2.36) is proved for the Dirichlet setting.Our version is a bit more flexible by allowing Then there exists a constant N (β, c) such that for all t ∈ [h, 1], x ∈ T one has the bound Writing we start with an estimate for the first term at the right hand side of (2.37).We write with We now show that each I i t (x) is bounded by the right-hand side of (2.36).By the orthonormality of e j we see that j 2 e 2λ j t . (2.38) Since λ j = −4π 2 j 2 , we can use Proposition 2.2.2 to conclude the claimed bound.Next recall that λ j ≤ λ n j ≤ 0 and that λ n j ≤ −16j 2 by (2.6), we get that By (2.7) and Proposition 2.2.2 we get (2.39) Using n −2 t again, we get the required bound.Next, for I 3 t (x) we can use that Therefore, (2.40) As usual, Proposition 2.2.2 implies the claimed bound.Before estimating I 4 t (x), we claim that for j, ℓ ∈ {−n, ..., n − 1}, with j = ℓ, the functions e j − e j • ρ n and e ℓ − e ℓ • ρ n are orthogonal in L 2 (T).Indeed, by the orthogonality of (e j ) j∈Z and the orthogonality of (e n ℓ ) −n≤ℓ≤n−1 , we have that Then we see that for j, n ∈ {−n, ..., n − 1}, j = ℓ, we have e i2πjy e −i2πℓk/2n dy = (e i2πj/2n − 1) which shows the claim.Consequently, giving the claimed bound as before.Putting (2.38)-(2.41)together, we conclude that (2.42) Next we see that As before, we show that both terms are bounded by the right-hand side of (2.36).Since t ≥ h implies κ n (t) ≥ t/2, we have (2.44) Proposition 2.2.2 and n −2 t yields the claimed bound.For the other term, we have where We have, using κ n (t) ≥ t/2 as before, and similarly, by (2.10), For the last term on the right-hand side of (2.45) we use the elementary inequalities 0 ≤ x − ln(1 + x) ≤ x 2 for all x ∈ [−1/2, 0], and |1 − e y | ≤ |y| for all y ≤ 0. Therefore, As before, this gives the desired bound.We conclude that Lemma 2.2.9.For any α ∈ [0, 1] there exists a constant N = N (α, c) such that for all ψ ∈ C α (T), t ∈ [0, 1], and y ∈ T, we have (2.47) Proof.We first show the desired inequality for α = 1.We use the reformulation of the discrete heat kernel as the transition kernel of a discrete time random walk, see Remark 2. 1.8.Further, we assume for now that t ∈ Λ n , y ∈ Π n .Recall that in this case, by (2.17) we have that where W t is a Brownian motion.By applying Lemma 2.2.8 with For t ∈ [0, 1], y ∈ T, we have where for the first term we have used Lemma 2.2.6 ( (2.33) for t ≥ h and (2.34) for t ∈ [0, h]) and for the last term we have used classical heat kernel estimates (for example (2.18) with α = β = 1).This proves the claim for α = 1.Also, for α = 0 the claim trivially holds.Finally, the claim for α ∈ (0, 1) follows by interpolation, see Remark 2.2.3.

Discrete and continuous Ornstein-Uhlenbeck processes
The last integral in (2.1) is also called the infinite dimensional Orstein-Uhlenbeck process.In the trivial case ψ = 0, b = 0, that is simply the solution process, and it also plays an important role in the general case.Let us therefore introduce a separate notation for it: Let (s, t) ∈ [0, 1] < .Thanks to the semigroup property of P the Ornstein-Uhlenbeck process satisfies (2.48) The second term on the right-hand side of is a Gaussian random variable that is independent of F s .Its variance is given by (2.49) Therefore, for any bounded measurable g : R → R and F s -measurable random variable Y one has the almost sure equality (2.50)Moreover, from (2.25) one gets with an absolute constant N > 0 for all r ∈ (0, 1] the bound (2.51) The following is well-known.We similarly define the discrete Ornstein-Uhlenbeck process by setting p n κn(t−r) (x, y) ξ(dy, dr), (t, x) ∈ [0, 1] × T.
Clearly, O n s,t (x) is F s -measurable, while the second term in the right-hand side is a Gaussian random variable that is independent of F s .Its variance is given by Therefore, for any continuous function g and F s -measurable random variable Y one has the almost sure equality (2.52) In the next two statements we compare the expressions in (2.52) to the corresponding ones in (2.50).

By (2.56) (below), we have
Combining the two estimate above gives (2.54).In the case t ∈ (h, 1] we use Lemma 2.2.7 and the above estimates for Q and Q n to write This finishes the proof. Since r ≤ h implies rn r 1/2 , the second term dominates.One similarly gets r 1/2 r 1/2−β/4 n −β/2 for any β ≥ 0, yielding (2.55).Moving on to the r ≥ h case, one has n −β/2 s −(β+1)/4 s −1/4 ds, using (2.36) to get the last line.As long as β < 2, the last integral is finite and is of the required order.Finally, the condition r ≥ h implies n −1 n −β/2 r 1/2−β/4 , finishing the proof.
For very short times, that is, r ∈ [0, h], one has from (2.26) (2.56) Otherwise, we have the following control on Q n .
Lemma 2.3.4.For any β ∈ [0, 1], there exist constants N > 0 depending only on β and c such that for any r ∈ [h, 1], r ′ ∈ [r, 1], we have (2.58) Proof.First we show (2.57).Define Since Qn ≤ Q n , it suffices to bound Qn .From (2.25) one gets for all r ∈ (0, 1] with some absolute constant N 1 > 0. By the triangle inequality one can write Therefore, one has Choosing N ′ sufficiently large, the bound (2.57) indeed follows for r then by the monotonicity of Q n and the fact that Q n (h) = 2nh (see (2.56)), we get We continue with (2.58).By (2.10) and Proposition 2.2.2 we get , which combined with (2.61) finishes the proof.
Since p and θ are arbitrary, by Kolmogorov's continuity criterion we get and the claim follows since the bound does not depend on t or n.
Proof.Let us set denote v n t = P n t ψ n + O n t .The conclusion of the lemma with u n replaced by v n is an immediate consequence of Lemma 2.2.5 and Lemma 2.3.5.
From Girsanov's theorem (see e.g.[DPZ92, Thm 10.14] for a sufficiently general version) one has that under the measure P defined by the mapping defines a white noise.In particular, the law of u n under P and the law of v n under P coincide.It is also an easy exercise that Eρ −1 ≤ N ( b B ) < ∞.Therefore, This finishes the proof.

The sewing strategy
As already discussed, one of the main tools for proving our main results is the stochastic sewing lemma.First we give an outline of the strategy, identify the various terms to be bounded, and then carry out the estimates.

Overview
Here we give a brief overview of the strategy of the proof.For reference, we will compare to the 1-dimensional additive SDE dX t = f (X t ) dt + dW t driven by a standard Wiener process W .Let us assume f ∈ C α (R) with some α ∈ (0, 1).The Euler-Maruyama approximation of the SDE reads as where we briefly use the notation κn (t) = ⌊nt⌋n −1 .Assuming identical initial conditions, one can decompose the error as One then aims to bound the first term by |X − X n | with some norm | • | and the second by a negative power of n, which can in fact be n −(1+α)/2 .If one furthermore achieves a small constant (say, less than 1/2) in the first bound, then the inequality buckles and the error itself is bounded by n −(1+α)/2 .

T
Of course neither of these tasks are really obvious, since simply bounding the integrals by bringing the absolute value inside gives the bounds t X − X n α L∞([0,t]) and n −α , respectively.The former is particularly problematic, since buckling arguments (or equivalently, Gronwall-type lemmas) fail for powers strictly less than 1. In [BDG21] this issue is overcome by stochastic sewing approach, which however requires to work with a stronger norm: the choice suffices for example.On one hand, this has the advantage of providing the final error estimates in a strong norm, the drawback is that instead of (3.1) one has to control the increments of the error as well.
In infinite dimensions there are several issues with this strategy.We have First, the quantity u t − u s does not have a natural form as an integral from s to t.Second, even if one considers the "mild" increments u t − P t−s u s , there is no nice analogous increment for the approximate solution.Instead, we study the quantity The above is not an increment (not even mild), however, it is an analogue of the increments of the right-hand side of (3.1), in the infinite dimensional case, which serves its purpose.
We will use the decomposition (3.3)Our goal will be to estimate the term E n,1 in terms of E n , which will lead to buckling for E n , and the remaining E n,2 , E n,3 by some power of n.Both of these steps will be achieved by the stochastic sewing lemma.Finally, notice that the above procedure will give an estimate for E n and not u − u n itself.The reason that we follow this route will become clearer later, see Remark 3.2.2.We now recall the stochastic sewing lemma.The notation (iii) There exists constants K 1 , K 2 > 0 such that for all (s, t) ∈ [s ′ , t ′ ] < we have (3.7) In addition, there exists a constant K > 0 depending only on ε 1 , ε 2 and p, such that for all (s, t) ∈ [s ′ , t ′ ] < we have
Remark 3.2.2.Notice that the right-hand side contains the term , where τ > 1/4.This is the reason that we aim to buckle for E n and not u − u n itself, as the latter has no more than 1/4 regularity in time, because of the term O − O n .
Proof.By linearity in f , we may and will assume f B(R) = 1.We first assume that f is in addition Lipschitz, derive the bound (3.9) that does not depend on its Lipschitz norm, and then T conclude with a standard approximation argument.We fix x ∈ T, and for (s, t) ∈ [s ′ , t ′ ] < we define We aim to verify the conditions of Theorem 3.
First consider the case t ≥ s + h.We then have For I 1 we have the trivial estimate As for I 2 , by applying (2.18) we get Next, using (2.53) with β = 1 − 2ε gives, (3.12)By using (2.47) with α = 1/2 − ε, and the assumption of the theorem, we get φ s,r (y) − φ n s,r (y) Lp(Ω) ≤ P n r ψ n (y) − P r ψ(y) (3.13)Moreover, by using (2.55) with β = 2 − 4ε we get This in turn implies that We conclude that (3.4) is satisfied with and ε 1 = 1/4.Next, let us bound the term E s δA s,u,t (x) Lp(Ω) .A simple calculation shows that Similarly to before, we write Let us start by a rough estimate when |r − u| ≤ h.We pair up the first and third, and the second and fourth terms in D r (y) and apply (2.18) (with β = 1 and α = 0).This combined with (2.51) and (2.56) gives Notice that for all X, Y ∈ L ∞ (Ω) with Y being F s -measurable, by the triangle inequality, conditional Jensen's inequality, and the monotonicity of the conditional expectation, we have . By using this, we see that |r − s|. (3.18) Similarly, we get Let us now first deal with the case t ∈ [u, u + h).Putting the above bound into (3.17)we get For I 1 we may use (3.20) again to get As for I 2 , we decompose the integrand as D r = D 1 r + D 2 r , where For D 1 r we use (2.20) to obtain From (3.12), (3.13) and (3.18) we get that Moreover, similarly to the argument for (3.18), we get Therefore, by the above estimates and (2.51), we get that where we used that τ < 3/4.Integrating the bounds (3.22) and (3.25) with respect to r, we conclude that This shows that (3.5) is satisfied with and ε 2 = 3/4 + τ − 1 > 0, where we used that τ > 1/4.Therefore Theorem 3.1.1applies.We claim that the map A : [s ′ , t ′ ] → L p (Ω) constructed in Theorem 3.1.1coincides with First of all, it is obvious that A t (x) is F t -measurable for each t ∈ [s ′ , t ′ ] and that A s ′ = 0.
We aim to verify the conditions of the stochastic sewing lemma.By the tower property of conditional expectation, it is easy to see that This shows that (3.5) is satisfied with C 2 = 0.
Moving on to (3.4), we separate two cases.When t ≥ s + 2h, we write For r ∈ [s + 2h, t] we have κ n (r) ≥ s.Therefore, we have by (2.52) Applying (2.18) for the outer heat kernels with α = 0 and β = 1 we have It remains to integrate with respect to y and r to get The term I 1 is trivial: by using the boundedness of g we get  Notice that s ′ , t ′ where arbitrary, as was x ∈ T, hence, the claim follows.

Proof of Theorem 1.3.2
As indicated in Section 3.1, we first aim to derive a buckling inequality for E n .In the decomposition (3.3) the only term not treated so far is E n,3 , for which however it is easy to see the almost sure bound sup x∈T E n,3 s,t (x) n −1/2 |t − s| 1/2 .(3.34) Indeed, when |t − s| ≤ h, then simply using the boundedness of b yields a bound of order |t − s|, which even implies a bound of order n −1 |t − s| 1/2 .In the regime |t − s| ≥ h we split the integral into two as usual, and the trivial estimates Returning to the main error, we have u t (x) − u n t (x) = (P t ψ(x) − P t ψ n (x)) + (P t ψ n (x) − P n t ψ n (x)) + E n 0,t (x) + (O t (x) − O n t (x)).By iterating the argument at most 2/N 0 times and recalling that N 0 does not depend on n, the proof is finished.
For the proof of the Lemma 2.2.9 below, let us recall the following result, a nice consequence of Stein's method of normal approximation (see, for example, [CGS11, Corollary 4.2, p. 68]).
.46) Combining (2.37), (2.42), (2.43), (2.44), and (2.46) brings the proof to an end.E We will assume that p = 2, since the general case follows by the equivalence of Gaussian moments.Notice that for t − s ≥ h, the estimate (2.53) with β = 2 follows directly by Itô's isometry and Lemma 2.2.7.Since it is true for β = 2 is is also true for anyβ ∈ [0, 2] since 1/n ≤ 2c −1/2 |t − s| 1/2 .As for (2.54), there are two cases.First, assume that t ∈ [0, h].By Itô's isometry and (2.51), we have . (2.61) On the other hand, we can estimate the difference |Q n (r) − Q n (r ′ )| term by term.Recalling (2.55) and (2.51), we get |Q n ′ ≤ s ≤ u ≤ t ≤ t ′ , where δA s,u,t := A s,t − A s,u − A u,t .Then there exists a unique map A : [s ′ , t ′ ] → L p (Ω) with the following three properties:(i) With probability one, A s ′ = 0. (ii) A t is F t -measurable for all t ∈ [s ′ , t ′ ], [ABLM20]|t − s| 1+ε 2 .Remark 3.1.2.It can be sometimes convenient to incorporate the semigroup of the (linear part of the) equation into the the formulation of the stochastic sewing lemma, as in[LS22].This is indeed how the first version of the present paper proceeded, but as noted by a referee, it is easier to reduce the argument to the original stochastic sewing lemma, similarly to[ABLM20].
1.1.We start by (3.4), that is, by obtaining an estimate for A s,t (x) Lp(Ω) .First of all, notice that we can interchange the action of E s and P t ′ −r , and therefore by (2.50) and (2.52) one can write ∈ [s ′ , t ′ ] < , and that by (2.57) we also have Q n (r − s) |r − s| 1/2 for r ≥ s + h, we conclude that .15) Hence, in the regime t ≥ s + h we have from (3.10) and (3.15) that A s,t (x) Lp(Ω) |t − s| 3/4 (n −1/2+ε + sup If t ∈ [s, s + h), we can simply use a trivial bound: y)|,where for the second inequality we have used (2.51), (2.57), and (2.55) (the latter with β = 1).
It clearly suffices to prove the claim for p ≥ 2 and g B(R) = 1.Let us fix x ∈ T and (s ′ Hence, we only have to check that A • (x) satisfies (3.6)-(3.7)with some constants K 1 and K 2 .Notice that Proof.