Parallel Black-Box Complexity with Tail Bounds

We propose a new black-box complexity model for search algorithms evaluating $\lambda$ search points in parallel. The parallel unary unbiased black-box complexity gives lower bounds on the number of function evaluations every parallel unary unbiased black-box algorithm needs to optimise a given problem. It captures the inertia caused by offspring populations in evolutionary algorithms and the total computational effort in parallel metaheuristics. We present complexity results for LeadingOnes and OneMax. Our main result is a general performance limit: we prove that on every function every $\lambda$-parallel unary unbiased algorithm needs at least $\Omega(\frac{\lambda n}{\ln \lambda} + n \log n)$ evaluations to find any desired target set of up to exponential size, with an overwhelming probability. This yields lower bounds for the typical optimisation time on unimodal and multimodal problems, for the time to find any local optimum, and for the time to even get close to any optimum. The power and versatility of this approach is shown for a wide range of illustrative problems from combinatorial optimisation. Our performance limits can guide parameter choice and algorithm design; we demonstrate the latter by presenting an optimal $\lambda$-parallel algorithm for OneMax that uses parallelism most effectively.


Parallel Black-Box Complexity With Tail Bounds Per Kristian Lehre and Dirk Sudholt
Abstract-We propose a new black-box complexity model for search algorithms evaluating λ search points in parallel. The parallel unary unbiased black-box complexity gives lower bounds on the number of function evaluations every parallel unary unbiased black-box algorithm needs to optimize a given problem. It captures the inertia caused by offspring populations in evolutionary algorithms and the total computational effort in parallel metaheuristics. 1 We present complexity results for LeadingOnes and OneMax. Our main result is a general performance limit: we prove that on every function every λ-parallel unary unbiased algorithm needs at least a certain number of evaluations (a function of problem size and λ) to find any desired target set of up to exponential size, with an overwhelming probability. This yields lower bounds for the typical optimization time on unimodal and multimodal problems, for the time to find any local optimum, and for the time to even get close to any optimum. The power and versatility of this approach is shown for a wide range of illustrative problems from combinatorial optimization. Our performance limits can guide parameter choice and algorithm design; we demonstrate the latter by presenting an optimal λ-parallel algorithm for OneMax that uses parallelism most effectively.

I. INTRODUCTION
B LACK-BOX optimization describes a challenging realm of problems where no algebraic model or gradient information is available. The problem is regarded a black box, and knowledge about the problem in hand can only be obtained by evaluating candidate solutions. General-purpose metaheuristics like evolutionary algorithms (EAs), simulated annealing, ant colony optimizers, tabu search, and particle swarm optimizers are well suited for black-box optimization as they generally work well without any problem-dependent knowledge.
A lot of research has focussed on designing powerful metaheuristics, yet it is often unclear which search paradigm works best for a particular problem class, and whether and how better performance can be obtained by tailoring a search paradigm to the problem class in hand.
Black-box complexity is a powerful tool that describes limits on the efficiency of black-box algorithms. The black-box complexity of search algorithms captures the difficulty of problem classes in black-box optimization. It describes the minimum number of function evaluations that every blackbox algorithm needs to make to optimize a problem from a given class. It provides a rigorous theoretical foundation through capturing limits to the efficiency of all black-box search algorithms, providing a baseline for performance comparisons across all known and future metaheuristics as well as tailored black-box algorithms. Also it prevents algorithm designers from wasting effort on trying to achieve impossible performance.
Many different models of black-box complexities have been developed. The first black-box complexity model by Droste et al. [28] makes no restriction on the black-box algorithm. This leads to some unrealistic results, such as polynomial black-box complexities of NP-hard problems [28]. Subsequent research introduced refined models that restrict the power of black-box algorithms, leading to more realistic results [18], [20], [21], [28], [57], where black-box algorithms can only query for the relative order of function values of search points [20], [57] as well as memory restrictions [21], [28] and restrictions on which search points are allowed to be stored [23]- [25]. Lehre and Witt [45] introduced the unbiased black-box model where black-box algorithms may only use operators without a search bias (see Section II). This model initially considered unary operators (such as mutation) and was later extended to higher arity operators (such as crossover) [16] and more general search spaces [53]. It also led to the discovery of more efficient EA variants [11]. For further details, we refer to the comprehensive survey by Doerr [22].
A shortcoming of the above models is that they do not capture the implicit or explicit parallelism at the heart of many common search algorithms. EAs, such as (μ + λ) EAs or (μ, λ) EAs generate λ offspring in parallel. Using a large offspring population in many cases can decrease the number of generations needed to find an optimal solution. 2 However, the number of function evaluations may increase as evolution can only act on information from the previous generation. A large offspring population can lead to inertia that slows down the optimization process. Existing black-box models are unable to capture this inertia as they assume all search points being created in sequence.
The same goes for parallel metaheuristics such as island models evolving multiple populations in parallel (see [47]). Parallelization can decrease the number of generations, or parallel time. But the overall computational effort, the number of function evaluations across all islands, may increase. Lässig and Sudholt [44] used the following notion. Let T λ be the random number of generations an island model with λ islands (each creating one offspring) needed to find a global optimum for a given problem. If using λ islands can decrease the parallel time by a factor of order λ, compared to just one island, λ · E(T λ ) = O(E(T 1 )), this is called a linear speedup (with regards to the parallel time, the number of generations). In other words, a linear speedups means that the total number of function evaluations, λ · E(T λ ), does not increase beyond a constant factor.
Previous work [43], [44], [48] considered illustrative problems from pseudo-Boolean optimization and combinatorial optimization, showing sufficient conditions for linear speedups. However, the absence of matching lower bounds makes it impossible to determine exactly for which parameters λ linear speedups are achieved.
We provide a parallel black-box model that captures and quantifies the inertia caused by offspring populations of size λ and parallel EAs evaluating λ search points in parallel. We present lower bounds on the black-box complexity for the well known LEADINGONES (LO) problem and for the general class of functions with a unique optimum, revealing how the number of function evaluations increases with the problem size n and the degree of parallelism, λ. The results complement existing upper bounds [44], allowing us to characterize the realm of linear speedups, where parallelization is effective.
Our lower bound for functions with a unique optimum is asymptotically tight: for the ONEMAX problem, we present a simple (1 + λ) EA with an adaptive mutation rate that achieves an asymptotically optimal performance amongst all parallel unary unbiased black-box algorithms. Our adaptive mutation rates decrease the expected running time by a factor of order ln ln λ, compared to the (1 + λ) EA with the standard mutation rate 1/n [17].
This article extends a previous conference paper [1] with parts of the results. A major novelty in this manuscript is the introduction of black-box complexity results with tail bounds. Existing black-box complexity results only make statements about the expected number of evaluations it takes to find a global optimum. 3 However, it is often not clear whether the expectation is a good reflection of the performance observed in practice. We provide black-box complexity lower bounds that apply with an overwhelming probability. More precisely, using the notation ln + x := max(1, ln x) whenever the argument can be smaller than the logarithm's base, 4 we show for every target search point x * we can choose that every λ-parallel unary unbiased black-box algorithm needs at least max cλn ln + λ , (1 − δ)n ln n = λn function evaluations to find x * , with an overwhelming probability, 5 where c is a constant with c ≥ 1/60. The leading constant 1 − δ in the n ln n term can be chosen 6 arbitrarily close to 1. This means that it is practically impossible for any unary unbiased black-box algorithm to find a designated target with less than cλn/ ln + λ or less than (1 − δ)n ln n evaluations. The latter bound applies to parallel and nonparallel unary unbiased algorithms. In addition, if the probability of finding a single target x * in the stated time is exponentially small, the probability of finding many target points is still exponentially small. This simple union bound argument opens up a range of opportunities for obtaining stronger results that are much more relevant to practice than the state-of-the-art. Our method is powerful and versatile since we can choose any set of target search points, up to an exponential size. This allows for different applications.
1) Considering global optimization, our lower bound (1) applies to highly multimodal functions, even allowing for up to exponentially many optima. Apart from results tailored to specific problem classes [18], the only generic black-box complexity lower bounds apply to functions with one unique global optimum. Our lower bound yields a general baseline that applies to all unary unbiased black-box algorithms and a wide range of problems. 2) Choosing all local optima as target search points, we also get that for functions with up to exponentially many local optima, every λ-parallel unary unbiased algorithm needs at least the stated time (1) to find any local optimum. 3) Since we can have exponentially many target search points, we can even afford to consider all search points within an almost linear Hamming distance to any local optimum as target. Then our results imply that even the time to get close to any local or global optimum is bounded from below by (1). We demonstrate the applicability and versatility of our main result by deriving the first black-box complexity lower bounds for a wide range of illustrative function classes, from synthetic problems (TWOMAX, H-IFF, JUMP k , and CLIFF) that are very popular in the evolutionary computation literature to classes of benchmark functions [41] and important problems from combinatorial optimization, such as VERTEX COLORING, MINCUT, PARTITION, KNAPSACK, and MAXSAT.
In addition to providing a solid unifying theoretical foundation for black-box algorithms, we believe that our results are of immediate relevance to practice. Our black-box complexity with tail bounds gives hard limits on the capabilities of (unary unbiased) black-box algorithms. These limits can be used to set stopping criteria appropriately, avoiding stopping an algorithm before it has had a chance to come close to local or global optima. They are useful to set parameters such as the offspring population size λ: if we have a limited 5 An overwhelming probability is defined as 1 − 2 − (n ε ) for some constant ε > 0. 6 The precise result contains a tradeoff between the leading constant and the exponent of the overwhelming probability formula (see Theorem 5). computational budget of T evaluations, (1) implies that we must choose λ satisfying λ/ ln + λ ≤ T/(cn) as for larger values T is lower than (1), meaning that every λ-parallel unary unbiased black-box algorithm fails badly with overwhelming probability. Moreover, our lower bounds can serve as baseline in performance comparisons across various algorithms. And, last but not least, knowing what is impossible is vital for guiding the search for the best possible algorithm. The feasibility of this approach is demonstrated in this article as we present an optimal λ-parallel algorithm for ONEMAX that uses parallelism most effectively.

II. PARALLEL BLACK-BOX MODEL
Following Lehre and Witt [45], we only use unary unbiased variation operators, i.e., operators creating a new search point out of one search point. This includes local search, mutation in EAs, but it does not include recombination.
A unary variation operator can be formally described as a conditional probability distribution p(·|·), where for any pairs of bitstrings x, y ∈ {0, 1} n , p(y|x) is the probability that the variation operator produces an "offspring" y from the "parent" x. A unary variation operator is called unbiased (see [45], [53]) if for all bitstrings x, y, z ∈ {0, 1} n and permutations σ : ). ⊕ is the XOR operator, and the function σ b (x) is the permutation over the bit-positions, defined by Informally, unbiasedness means that there is no bias toward particular regions of the search space; unbiased operators over {0, 1} n must treat all bit values 0, 1 and all bit positions 1, . . . , n symmetrically. This is the case for many common variation operators such as standard bit mutation.
Throughout this article, we only deal with unbiased algorithms as the performance of biased algorithms may depend on the particular encoding used. For example, the (1 + 1) EA with the asymmetric mutation operator defined in [39] flips zeros and ones with different probabilities. This leads to improved expected times of O(n) and O(n 3/2 ) on ONEMAX and LO, respectively, but this advantage disappears when the fitness function is transformed with operators ⊕ or σ b [39]. Unbiased algorithms show the same performance on all possible transformations ⊕, σ b of a fitness function.
Unbiased black-box algorithms query new search points based on the past history of function values, using unbiased variation operators. We define a λ-parallel unbiased black-box algorithm in the same way, with the restriction that in each round λ queries are made in parallel (see Algorithm 1). We use the abbreviation uar for uniformly at random. These λ queries only have access to the history of evaluations from previous rounds; they cannot access information from queries made in the same round. We refer to these λ search points as offspring to indicate search points created in the same round.

5:
Choose z uar from arg max{f (y 1 ), . . . , f (y λ )}. 6: if f (z) ≥ f (x) then x = z 7: until termination condition met differences in the initialization). It can further model parallel EAs, such as cellular EAs with λ cells, or island models with λ islands, each of which generates one offspring in each generation.
The (1 + λ) EA maintains the current best search point x and creates λ offspring by flipping each bit in x independently with probability p (with default p = 1/n). The best offspring replaces its parent if it has fitness at least f (x).

A. Parallel Black-Box Complexity
The optimization time is commonly defined as the number of function evaluations made before a global optimum is found for the first time. The unbiased black-box complexity (uBBC) of a function class F is the minimum worst-case optimization time among all unbiased black-box algorithms [45] (equivalent to Algorithm 1 with λ = 1). The unbiased λ-parallel black-box complexity (λ-upBBC) of a function class F is defined as the minimum worst-case number of function evaluations among all unbiased λ-parallel algorithms satisfying the framework of Algorithm 1.
With increasing λ access to previous queries becomes more and more restricted. It is therefore not surprising that the blackbox complexity is nondecreasing with growing λ. For every family of function classes F n and all λ ∈ N as any unbiased algorithm can be simulated by a λ-parallel unbiased black-box algorithm using one query in each round. Also note that the unary uBBC can be regarded as the 1-parallel unary uBBC, uBBC(F n ) = 1-upBBC(F n ).
The following lemma shows that the parallel black-box complexity increases with the degree of parallelism, modulo possible rounding issues.
Note that cut-off points are not unique: if λ * is a cut-off point, then every λ = (λ * ) is also a cut-off point.
A cut-off point determines the realm of linear speedups [44], where parallelization is most effective. Below the cut-off, for an optimal parallel black-box algorithm the number of function evaluations does not increase (beyond constant factors), but the number of rounds decreases by a factor of (λ). The number of rounds corresponds to the parallel time if all λ evaluations are performed on parallel processors. Hence, below the cut-off it is possible to reduce the parallel time proportionally to the number of processors, without increasing the total computational effort (by more than a constant factor).

III. PARALLEL BLACK-BOX COMPLEXITY
OF LEADINGONES We consider the function LO(x) := n i=1 i j=1 x j , counting the number of leading ones in x. It is an example of a unimodal function where a specific bit needs to be flipped to increase the fitness. Similarly, LZ(x) counts the number of leading zeros in x. We first provide a tool for estimating the progress made by λ trials, which may or may not be independent. It is based on moment-generating functions (mgf).
Lemma 2: Given λ random variables X 1 , . . . , X λ ∈ N, not necessarily independent, let X (λ) Proof: Note first that for any i ∈ [λ] and j ∈ N, it follows from Markov's inequality that Pr(X i ≥ j) = Pr(e ηX i ≥ e ηj ) ≤ e −ηj E e ηX i ≤ e −ηj D. Now, let k := ln(Dλ)/η. Recall that the expectation of any non-negative, integer-valued random variable N can be written as . From this and a union bound, we get We now state the λ-parallel black-box complexity of LO.
The cut-off point is λ * LO = n. The parallel time for an optimal algorithm is (n/(ln + (λ/n)) + n 2 /λ) and O n + n 2 /λ . This result solves an open problem from Lässig and Sudholt [44], confirming that the analysis of the realm of linear speedups for LO from Lässig and Sudholt [44] is tight.
Proof: The upper bound of O λn + n 2 follows easily from an upper bound of O n + n 2 /λ on the number of generations for a (1 + λ) EA from Lässig and Sudholt [43, Th. 1]. 8 The intuition behind this bound is that λ parallel queries can lead to a speedup of a factor of (λ), compared to the expected time of (n 2 ) for the (1 + 1) EA. The upper bound also contains an additive term of n for the number of nonoptimal fitness values. This term limits the possible speedups that can be proven using the cited theorem.
A lower bound (n 2 ) follows from the unary uBBC of LO [45], which by (2) is a lower bound on the λ-parallel unary uBBC. Hence, the statement holds for the case λ = O(n). Thus, we only need to consider the case λ = ω(n) and to prove a lower bound of (λn/ ln + (λ/n)) = (λn/ ln(λ/n)) for this case.
We proceed by drift analysis. Let the "potential" of a search point x be and define the potential of the algorithm, P t at time t to be the highest potential of all search points produced until time t.
Assume that the potential in generation t is P t = k. In any generation t, let X i for i ∈ [λ] be the indicator variable for the event that all of the first k + 1 bit-positions in individual i are 1 bits (or 0 bits). Furthermore, let Y i be the number of consecutive 1 bits (or 0 bits) from position k + 2 and onwards, i.e., the number of "free riders." To bound the progress in potential, we now estimate a bound on the expectation of max i∈[λ] X i Y i . We first claim that Pr(X i = 1) = O(1/n) by recapping arguments from Lehre and Witt [45,proof of Th. 2]. For any previously generated search point x, the number of 0 bits (or 1 bits) s in the first k + 1 positions satisfies 1 ≤ s ≤ k + 1. Assume that the algorithm creates a new search point x by flipping r bits uniformly at random in the selected search point x. Clearly, in order for the offspring x to have only 1 bits (or 0 bits) in the first k + 1 bit-positions, it is necessary that r ≥ s. Focusing only on the first k + 1 bit-positions, the algorithm must flip exactly s 0 bits in the first k + 1 positions, and no 1 bits. Optimistically assuming that the algorithm flips exactly s bitpositions within the first k + 1 positions, the algorithm needs to choose s bits correctly out of k + 1 bit positions. Thus, the probability that the first k + 1 bits in the new search point x are only 1 bits (or only 0 bits) is therefore no more than 1 The claim now follows by a union bound, taking into account the probability of having all 0 bits or all 1 bits in the first k +1 bit-positions.
Since the algorithm uses unary unbiased variation operators, Lehre and Witt [45,Lemma 1] implies that each random variable Y i , i ∈ [λ], is stochastically dominated by a geometric random variable Z i with parameter 1/2. The expected progress in potential is therefore The mgf of the geometric random variable Z i is M Z i (η) = 1/(2−e η ). The tower property of the expectation and Lemma 2 with η := ln(3/2) and D := 2 give where the last inequality follows from Jensen's inequality and the last equality follows from log(λ/n) = (1). With overwhelmingly high probability, the initial potential is at least n/2. Hence, by classical additive drift theorems [36], the expected number of rounds to reach the optimum is (n/ ln + (λ/n)). Multiplying by λ gives the number of function evaluations.

IV. PARALLEL BLACK-BOX COMPLEXITY OF FUNCTIONS WITH ONE UNIQUE OPTIMUM
Jansen et al. [38] considered the (1 + λ) EA and established a cut-off point for λ where the running time increases from (n log n) to ω(n log n) λ * (1 + λ) EA on ONEMAX = ((ln n)(ln ln n)/(ln ln ln n)). (4) Doerr and Künnemann [17] presented the following tight bounds for bounded λ.
Theorem 2 (Adapted From [17]): The expected optimization time of the (1 + λ) EA on ONEMAX is where the upper bound holds for λ = O(n 1−ε ) and the lower bound holds for λ = O(n). We show that the parallel black-box complexity is lower than the bound from Theorem 2 for large λ by a factor of order ln + ln + λ.
Theorem 3: For any λ ≤ e √ n the λ-parallel unbiased unary black-box complexity for any function with a unique optimum is at least λn The corresponding parallel time for an optimal algorithm is We will show in the next section that this bound is tight for ONEMAX. Consequently, the cut-off point for ONEMAX is . This is higher than the cut-off point for the (1 + λ) EA with the standard mutation rate p = 1/n from (4) and [38].
To prove Theorem 3, we consider the progress made during a round of λ variations in terms of a potential function defined in the following. The following definitions and arguments, including several lemmas shown in the following, will also be used in Section VI to prove lower bounds that hold with overwhelming probability.
Without loss of generality, we assume that the search point 1 n is the optimum. Following Lehre and Witt [45], we assume a "mirrored" sampling process, where every time a bit string x is queried (including in the initial generation), the algorithm queries the complement bit string x for "free." This is necessary as a black-box algorithm can try to locate the complement of the global optimum and it then just needs to flip all bits to find the optimum. Thus, we have to consider the progress toward the global optimum as well as the progress toward its complement.
Definition 2: Define the 0-potential s t 0 as the minimum number of zeros in all search points queried in all steps up to time t. For all s t 0 ≤ m ≤ n − s t 0 and r ∈ {0, . . . , n} we define the random variable 0 (s t 0 , m, r) := max{0, s t 0 − |y| 0 } where |y| 0 is the number of zeros in a random search point y obtained by applying unbiased variation with radius r to a search point with m zeros. Define the 1-potential s t 1 and 1 symmetrically with respect to the number of ones.
Due to mirrored sampling, we always have s t 0 = s t 1 , hence we simply write s t or just s if we refer to the current point in time. Then we define the progress in terms of the potential as (s, m, r) = max{ 0 (s, m, r), 1 (s, m, r)}. Note in particular that for all z ∈ N we have Also note that by symmetry of zeros and ones 0 (s, m, r) has the same distribution as 1 (s, n − m, r), hence it suffices to study the distribution of 0 . We also have for all s, m, r with s ≤ m ≤ n − s 0 (s, m, r) = 0 (s, n − m, n − r) (6) as flipping all bits (in the transition from m to n − m) and then flipping all but r bits in the variation has the same effect as flipping r bits in the first place. Hence, it suffices to consider 0 (s, m, r) for s ≤ m ≤ n/2. Now, consider the progress 0 (s, m, r). Let Z be the number of 0 bits that flipped to 1, then there are r − Z new 0 bits that were originally 1. Therefore, the number of 0 bits in the new generated search point is m − Z + (r − Z), where Z can be described by the hypergeometric distribution with parameters n, m, and r. We only make progress if the number of 0 bits in the new search point is less than s. Hence, the progress (decrease in 0-potential) is We show a tail inequality for hypergeometric variables and use this to derive a progress bound for the 0-potential.
Lemma 3: Let Z be a hypergeometrically distributed random variable with parameters n (number of balls), m (number of red balls), and r (number of balls drawn). For all z ∈ N 0 Proof: We assume z ≤ m and z ≤ r as otherwise Pr(Z = z) = 0. We further assume z ≥ 1 as for z = 0 the probability bound is 1 and the statement is trivial. Now The fraction can be written as Since z ≤ m, the second fraction above is at most 1. The first fraction is at most m z /n z as (m−i)/(n−i) ≤ m/n for all i ∈ N and m ≤ n. Plugging this into (7) yields If z ≥ r/2, this is at most (4m/n) z as r z ≤ 2 r ≤ 2 2z = 4 z . The next lemma shows that for any radius r the probability of having a progress of z decreases exponentially with z.
Lemma 4: Let s denote the current 0-potential. If s ≤ m ≤ n/8, then for all z ∈ N and r ∈ {1, . . . , n} . Proof: Applying Lemma 3 to the hypergeometric random variable Z with parameters m and r we have, for all z ∈ N 0 The following lemma gives another tail bound that will be used to exclude steps where a search point of potential m s is chosen for variation. The probability of having a positive progress decreases rapidly with growing m − s.
Putting all lemmas together shows that the expected progress is at most logarithmic in λ.
Lemma 6: Let This means that the probability of making any progress is exponentially small, for any r i . Thus, in the following we assume that m i ≤ n/8 for all i.

Proof of Theorem 3:
The lower bound (n log n) follows from unbiased unary black-box complexity [45]. Hence, it suffices to prove the lower bound (λn/ ln + λ).
Consider any λ-parallel unary unbiased black-box algorithm. We grant the algorithm an advantage by revealing all search points with Hamming distance at least n/16 to both 0 n and 1 n at no cost. Hence, the potential is always s ≤ n/16. By Chernoff bounds and a union bound over λ trials, the potential after initialization is n/16 with overwhelming probability.
Assuming this is the case, let (λ) 0 be the progress due to reduction of the 0-potential in one step, and (λ) 1 be the progress due to reduction of the 1-potential. Owing to the symmetry of 0 and 1 , Lemma 6 also applies to (λ) 1 . Hence, the expected change in potential per round is at most Hence, by the additive drift theorem [36], the expected number of rounds until one of the search points 0 n or 1 n is obtained is (n/ ln + λ). Multiplying by λ proves the claim.

V. OPTIMAL PARALLEL BLACK-BOX ALGORITHM FOR ONEMAX
The following theorem shows that the lower bound on the black-box complexity from Theorem 3 is tight. We show that the (1 + λ) EA has a better optimization time if the mutation rate is chosen adaptively, according to the current best fitness. This is similar to common ideas from artificial immune systems, particularly the clonal selection algorithm. Adaptive mutation rates for ONEMAX have been studied by Zarges [63], however the standard parameters for the clonal selection algorithm were too drastic to even obtain polynomial running times. Better results were obtained when using a population-based adaptation [64].
The following result reveals an optimal choice for the mutation rate of the (1 + λ) EA, depending on n and λ.
Theorem 4: On OneMax, the expected number of function evaluations of the (1 + λ) EA with an adaptive mutation rate p i = max{ln(λ)/(n ln(en/i)), 1/n}, where i is the number of zeros in the current search point, for any λ ≤ e √ n , is at most The parallel time (number of generations) is O(n/ ln + λ + (n log n)/λ). Proof: For λ = 1 the algorithm boils down to a (1 + 1) EA with mutation rate 1/n, hence we assume λ ≥ 2 where ln + λ = (ln λ). Let i be the current number of zeros and p i be the corresponding mutation rate. The probability of decreasing the number of zeros by any k ∈ N with k ≤ i is at least Then the probability that one of λ offspring will decrease the number of zeros by at least k is at least, Hence, for any k ≤ i the drift is at least For i > en/ ln λ, which implies p i n > 1, we set k := p i n = ln(λ)/ ln(en/i). We have k ≤ i since k ≤ ln(λ) ≤ √ n ≤ en/ ln λ. We use k := 1 for i ≤ en/ ln λ, the realm where p i = 1/n. This results in the following drift function h: The first terms are at most en + λ + λ en/ ln λ The second integral is bounded using This gives the upper bound (4 + e)λn/ ln(λ) + en · (2 + ln n). Note that the optimal mutation rate p = max{ln(λ)/(n ln(en/i)), 1/n}, in particular the functional relationship between the mutation rate and the current fitness i, is quite hard to guess through experimentation and was only revealed through the present theoretical analysis. After the result from Theorem 4 was first published [1], Doerr et al. [13] presented a self-adjusting scheme for choosing the mutation rate in the (1 + λ) EA and showed that it is able to match the upper bound from Theorem 4 without knowing the functional relationship between the mutation rate and the current fitness.

VI. TAIL BOUNDS
In this section, we now show that the lower bound for all λparallel unbiased unary black-box algorithms from Theorem 3 holds with high probability. In particular, it also applies to (nonparallel) unbiased unary black-box algorithms, for which only lower bounds on the expectation were known before [45]. Our main result is as follows.
Theorem 5: For every fitness function f : {0, 1} n → R, every constant 0 < δ < 1 and every set S of up to exp(o(n δ / log n)) search points, the following holds. Every unary unbiased λ-parallel black-box algorithm A on f , with probability 1−exp(− (n δ / log n)), does not query any search point from S within time max λn 60 ln + λ , (1 − δ)n ln n = λn ln + λ + n ln n .
The expected time also satisfies the asymptotic bound. Theorem 5 establishes very general limits to the performance of large classes of algorithms, including mutation-only EAs with standard mutation operators, local search, and simulated annealing. In particular, putting δ := 0.01 (say), Theorem 5 shows that every unary unbiased search algorithm needs to be run for at least n ln n evaluations as the probability of finding one of few global optima within 0.99n ln n evaluations is overwhelmingly small. The same holds for λ-parallel unary unbiased algorithms like mutation-only EAs with offspring populations of size λ. Here stopping a run before λn/(60 ln + λ) evaluations is futile as with overwhelming probability no optimum will have been found yet.
In addition, Theorem 5 makes a statement about a target set of up to exponential size. This means that the lower bounds also apply to functions with many global optima, with respect to the optimization time, but it can also be used to bound the time to find local optima or any set of high-fitness individuals of size at most exp(o(n δ / log n)). Section VII gives illustrative applications to a broad range of well-known problems.
Theorem 5 will be shown by separately showing lower bounds of (λn/ ln + λ) and (n log n) for the time to locate any fixed target search point x * that both hold with overwhelming probability. Then we use a union bound to show that even the probability to find one of exponentially many target search points within the stated time is still exponentially small. Again, we will assume mirrored sampling, i.e., every queried search point x also evaluates x for free.

A. Lower Bound (λn/ln + λ) With Overwhelming Probability
We start with a bound of (λn/ ln + λ) for the time to find a particular target search point x * , w.l.o.g. x * = 1 n . Recall from Definition 2 that due to mirrored sampling, we can define the potential as the minimum number zeros, or equivalently number of ones, in all search points up to time t. We will use [46, Th. 2] for a tail bound on the runtime, which requires the mgf of the progress  Theorem 6: For every unary unbiased λ-parallel black-box algorithm A, the probability that A finds any fixed target search point x * within λn/(60 ln + λ) steps is e − (n) .
Proof: Following the proof of Theorem 3, we assume without loss of generality that the search point 1 n is the optimum, and let (X t ) t∈N be the potential as defined before.
The result follows by taking into account that the algorithm makes λ fitness evaluations per iteration, i.e., T = λT , and that c > 1/60.

B. Lower Bound (n log n) With Overwhelming Probability
Now, we show a lower bound of (n log n) with overwhelming probability. Note that this result is independent of λ and thus unrelated to parallel black-box complexity; it gives limitations for general (parallel or nonparallel) unary unbiased black-box algorithms. Recall that every λ-parallel unary unbiased algorithm is also a unary unbiased algorithm, hence the result applies to a strictly larger class of algorithms. Previously only lower bounds on the expectation were known: Lehre and Witt [45] showed an asymptotic bound of (n log n) and Doerr et al. [12] presented a more precise lower bound of n ln n − O(n).
Theorem 7: For every unary unbiased black-box algorithm A and every constant 0 < δ ≤ 1, the probability that A finds any fixed target search point x * within (1 − δ)n ln n steps is exp(− (n δ / log n)).
Before presenting the proof of Theorem 7, we present the main idea behind the proof, and the challenges to overcome.
The proof will be based on the following well-known "coupon collector" argument that we discuss first for a simple algorithm, such as randomized local search (RLS) or the (1 + 1) EA. For these algorithms, we can argue that with high probability there will be cn bits in the initial search point that differ from the optimum, for an appropriate constant 0 < c < 1/2. Each such bit has a probability of 1/n of being flipped in each step of the algorithm. For a time period of T := (1 − δ)(n − 1) ln n steps, the probability that any fixed bit is never being flipped is at least using (1 − 1/n) n−1 ≥ 1/e. Now, the probability that there is a bit among the cn incorrect bits that is never being flipped is at least This implies that with the above probability the optimum has not been found in T = (n log n) steps. This argument works for RLS and the (1 + 1) EA for the following reasons.
1) The algorithms evolve a single lineage from the initial search point, which allows us to argue with "incorrect" bits that need to be flipped at least once. 2) The same variation operator is applied at all times, which establishes the formula (1 − 1/n) T .
3) All bits are treated independently, which is implicitly used in the derivation of the term (1 − n −(1−δ) ) cn . In order to prove Theorem 7, we have to consider all unary unbiased black-box algorithms, for which the above properties do not hold. In particular, algorithms may easily generate several lineages. This makes it unclear how incorrect bits can be defined. Also note that an algorithm might flip many incorrect bits in one step simply by choosing a very large radius. So the simple argument that we need to flip all incorrect bits at least once breaks down. Algorithms may choose different variation operators at different times, possibly depending on fitness values generated so far. This makes it difficult to argue that no variation flips a bit over a period of time. Finally, mutations with a fixed radius r ≥ 2 may introduce dependencies between bits, which needs to be addressed.
We tackle these challenges as follows. Assume w.l.o.g. that x * = 1 n . We give away knowledge of all search points x that have Hamming distance at least n * := n/(2 13 ln n) to both 0 n and 1 n . Hence, we start with a potential of s = n * . Moreover, whenever the algorithm decreases the potential from s to s < s, we grant the algorithm knowledge of all solutions with Hamming distance at least s from both 0 n and 1 n . This assumption implies that the current knowledge of the algorithm can be fully described by the current potential, and the progress of the algorithm can be bounded by considering the transitions of the potential.
Note that all solutions with the same potential are isomorphic to the algorithm. Pick a set of n * bit positions, w.l.o.g. the first n * ones. We define these bits as incorrect bits that need to be set to 1 in order to reach the optimum. Since the behavior of the algorithm is fully determined by the current potential, and the bit positions are irrelevant for transitions between potential values, we may assume w.l.o.g. that, whenever the algorithm performs a variation of a search point x t with k ones, x t = 0 n−k 1 k . Now variations that decrease the potential by decreasing the number of zeros will fix some of the incorrect bits accordingly. Variations that do not decrease the potential only create search points that are already known and thus can be ignored as they have no effect. Hence, we require that these incorrect bits are flipped in variations that decrease the potential.
Having laid the foundation for arguing with incorrect bits being fixed, we now show that with overwhelming probability, A does not find 1 n within T := (1 − δ)(n − 1) ln n steps.
Note that A can choose the radius in each step. We distinguish between single-bit variations where r = 1 (or, symmetrically, r = n − 1) and multibit variations where 2 ≤ r ≤ n − 2. We first show that in at most T steps with multibit variations, not too many incorrect bits are being fixed. Then we show later that at most T single-bit variations are not enough to fix all incorrect bits that are not being fixed by multibit variations. Note that the algorithm can interleave single-bit variations and multibit variations arbitrarily. Our arguments work for arbitrary sequences of single-bit and multibit variations; they even hold if the algorithm is allowed to make T single-bit variations and T multibit variations at the cost of T queries.
The following lemma considers multibit variations and bounds transition probabilities of the potential.
Lemma 8: Let s ≤ n * for n * := n/(2 13 ln n), then for every m ∈ [s, 2n * ] ∪ [n − 2n * , n − s], every radius 2 ≤ r ≤ n − 2 and every 1 ≤ z ≤ n we have If 2n * < m < n − 2n * we have Proof: Recall that by (6) it suffices to consider the case m ≤ n/2. If 2n * ≤ m ≤ n/2 then by Lemma 5 Now, assume s ≤ m ≤ 2n * . As shown in the proof of Lemma 4 We claim that the above is bounded by (16n * /n) 2 · 2 −z for all z ≥ 1 and all r ≥ 2. Note that Pr( 0 (s, m, r) = z) = 0 if z > r or if z = 1 and r = 2 as the progress must be an even number. For z = 1 and r ≥ 3 we get For z = 2 and all r ≥ 2 we get For z = 3 and r = 3 we get, using (8n For all r ≥ 4 we have, using (8n Using Lemma 8 now allows us to express the progress of any algorithm using stochastic domination and a combination of two simple random variables. Lemma 9: Let s ≤ n * for n * := n/(2 13 ln n), then for every s ≤ m ≤ n − s and every radius 2 ≤ r ≤ n − 2 the progress (s, m, r) is stochastically dominated by X t Y t where X t ∈ {0, 1} is a Bernoulli random variable with Pr(X t = 1) = 2(16n * /n) 2 and Y t is a geometric random variable with parameter 1/2, and X t and Y t being independent of each other and independent of other time steps t = t.
Proof: By Lemma 8 and the definition of X t , Y t for every z ≥ 1 and all m ∈ [s, 2n * ]∪[n−2n * , n−s]. The same clearly also holds in case 2n * < m < n − 2n * by the second statement of Lemma 8. This implies Pr( 0 (s, m, r) ≥ z) ≤ Pr(X t Y t ≥ z)/2 for all z ≥ 1. The probability bounds for 0 also apply to 1 by symmetry of zeros and ones, and thus by the union bound Pr ( (s, m, r) ≥ z) ≤ Pr( 0 (s, m, r) ≥ z)+Pr ( 1 (s, m, r) ≥ z) we get Pr( (s, m, r) ≥ z) ≤ Pr(X t Y t ≥ z) for all z ≥ 1. The last inequality also holds trivially for z = 0 as then both sides are 1. This completes the proof.
We use Lemma 9 to show tail bounds for the progress made in multibit variations. The following lemma shows that at most half of the incorrect bits are being fixed by multibit variation steps, even when considering a time span of n ln n steps instead of (1 − δ)n ln n.
Proof: We give a tail bound for the sum of variables X t Y t defined in Lemma 9; by stochastic domination, the tail bound then also holds for the real progress. Recall that X t as well as Y t are both sequences of independent and identically distributed (i.i.d.) variables and that all variables are mutually independent.
By Chernoff bounds, with overwhelming probability the number of X t variables attaining value 1 is bounded by at most twice its expectation If T t=1 X t ≤ 4T(16n * /n) 2 =: k then there are at most k variables Y t that contribute to T t=1 X t Y t . For ease of notation, we assume that these are variables Y 1 , . . . , Y k .
Taking the union bound for the two probabilities 2 − (n/ log n) that the typical events do not happen completes the proof. Now, we are ready to give a proof for Theorem 7. Proof of Theorem 7: As explained earlier, it suffices to consider n * incorrect bits and to show that with the claimed probability not all of these bits will be fixed within T unbiased variations.
Lemma 10 implies that with overwhelming probability there exist n * /2 incorrect bits that are not being fixed by up to T multibit variations. We now use coupon collector arguments (similar to those sketched earlier) to show that, in up to T single-bit variations, with overwhelming probability these n * /2 incorrect bits will not all be fixed.
The probability that any fixed bit i will not be flipped in a single-bit variation amongst the first T steps is at least, using Hence, the probability that a fixed bit i will be flipped in up to T single-bit variations is at least 1 − n −(1−δ) . Hence, the probability that all of the n * /2 incorrect bits are being flipped in T steps is at most Theorems 6 and 7 imply our main result (Theorem 5).
Proof of Theorem 5: Fix a target search point x * from the target set. By Theorem 6 the probability of finding x * within (λn)/(60 ln + λ) steps is exp(− (n)). Applying Theorem 7 with parameter δ yields that the probability of finding x * within (1 − δ)n ln n steps is exp(− (n δ / log n)). By the union bound, the probability that one of these lower bounds does not apply is exp(− (n))+exp(− (n δ / log n)) ≤ 2 exp(− (n δ / log n)). Repeating the above arguments for all target search points and using a union bound over at most exp(o(n δ / log n)) search points yields an overall probability bound of exp o n δ / log n · 2 exp − n δ / log n = exp − n δ / log n + o n δ / log n + ln 2 = exp − n δ / log n .

VII. BLACK-BOX COMPLEXITY RESULTS FOR ILLUSTRATIVE FUNCTION CLASSES
In this section, we give a number of examples of how to exploit the fact that our lower bounds apply to the time for finding an arbitrary target set of up to exponentially many search points. This leads to novel results for functions with many global optima, but can also be used to bound the time for reaching local optima or search points within a certain distance from any local or global optimum.

A. Black-Box Complexity Lower Bounds for Functions With Many Optima
Previous black-box complexity results like Theorem 3 or results on (nonparallel) uBBC [45] were limited to functions with a unique optimum. These results apply to popular test functions like ONEMAX and LO and function classes like linear functions or monotone functions [14]. However, they do not apply when considering functions with more than one optimum. Apart from tailored analyses for specific problems classes (e.g., problems from combinatorial optimization [18]), we are not aware of any generic black-box complexity results that apply to functions with multiple optima.
Theorem 5 overcomes this limitation, yielding novel blackbox complexity results for the unary uBBC and its λ-parallel variant across a range of problems with several global optima, including some widely studied problem classes. These blackbox complexity results give general limitations that can serve as baselines for performance comparisons and guide the search for the most efficient algorithms, including those using parallelism most effectively (as demonstrated successfully for ONEMAX in Section V).
There are many examples of relevant problem classes to which Theorem 5 applies. The most obvious class is that of all functions with exp(o(n δ / log n)) optima. Note that when choosing, say, δ := 0.995 then exp(n 0.99 ) ≤ exp(o(n δ / log n)); the reader may choose to think of the latter expression as exp(n 0.99 ) as this may be easier to digest.
Following Witt [62], the mentioned function class includes problems where all optima have at most n δ / log 3 n ones or at most n δ / log 3 n zeros. This is because the number of such search points is bounded by where the last step used n n δ/ log 3 n = exp( (n δ/ log 2 n )) = exp(o(n δ/ log n )).
In the following we survey a number of illustrative problems that have been studied previously and for which we give the first black-box complexity results. In terms of combinatorial problems, there are a lot of well-studied problems with a property called bit-flip symmetry: flipping all bits gives a solution of the same fitness. This means that there are always at least two global optima. Such problems have been popular as search algorithms need to break the symmetry between good solutions [32].
Well-known examples include the function TWOMAX : [32], which has been used as a challenging test bed in theoretical studies of diversitypreserving mechanisms [6], [7], [50]. The function hierarchical if and only if (H-IFF) [59] consists of hierarchical building blocks that need to attain equal values in order to contribute to the fitness. It was studied theoretically [9], [35] and is frequently used in empirical studies (see [33], [58]).
In terms of classical combinatorial problems, the VERTEX COLORING problem asks for an assignment of colors to vertices such that no two adjacent vertices share the same color. For two colors, a natural setting is to use a binary encoding for the colors of all vertices and to maximize the number of bichromatic edges (edges with differently colored end points). A closely related setting is that of simple Ising models, where the goal is to minimize the number of bichromatic edges. For bipartite (that is, 2-colorable) graphs, this is identical to maximizing the number of bichromatic edges as inverting one set of the bipartition turns all monochromatic edges into bichromatic ones and vice versa. Previous theoretical work includes EAs on ring/cycle graphs [30], the Metropolis algorithm on toroids [29], and EAs on binary trees [54].
Other combinatorial problems with bit-flip symmetry include cutting and selection problems. Given an undirected graph, the problems MAXCUT and MINCUT seek to partition the graph into two nonempty sets, such as to maximize or minimize the number of edges running between those two sets, respectively. Using a straightforward binary encoding for all vertices, this results in bit-flip symmetry and multiple optima. Theoretical studies of EAs on cutting problems include [49] and [55]; the latter paper considers a simple instance of two equal-sized cliques that leads to two complementary optima. Concerning selection problems, the well-known NP hard PARTITION problem asks whether it is possible to schedule a set of n jobs on two identical machines such that both machines will have identical loads. An optimization problem is obtained by trying to minimize the load of the fuller machine, also called the makespan. A straightforward encoding is used: every bit indicates which machine the corresponding job should be assigned to. Witt [61] analyzed the performance of the (1 + 1) EA for this problem, including random instance models where job sizes are drawn randomly from a real range, according to a uniform or an exponential distribution, respectively. In both cases such instances will almost surely have two complementary optima. 9 Wegener and Witt [60] considered monotone polynomials: a sum of monomials (products of variables, e.g., x 1 x 3 x 4 ) with positive weights. Here 1 n is always a global optimum, but more optima can exist if there are variables that do not appear in any monomial: each such variable doubles the number of optima as it is not relevant for the fitness. Hence, if there are o(n δ / log n) such variables then there are at most 2 o(n δ / log n) ≤ exp(o(n δ / log n)) optima.
Jansen and Zarges [41] presented instance classes called nearest peak functions and weighted nearest peak functions. Both are defined with respect to an arbitrary number of peaks: search points with an associated height and slope. For nearest peak functions the fitness of a search point is determined by its closest peak: for the peak itself the fitness is equal to the height of the peak and for other search points the fitness decreases gradually with the distance from the peak, according to the slope of the peak. Weighted nearest peak functions are defined similarly, but all peaks are considered and higher peaks can dominate shallower peaks. This function class was introduced as a test bed allowing to create an arbitrary number of optima. It is shown in [41] that the set of local optima is a subset of all peaks. Hence, the number of peaks is an upper bound on the number of global (and local) optima. The two function classes were named Jansen-Zarges function classes in [7], where they were used as benchmarks for the clearing diversity mechanism.
Finally, we consider random planted MAX-3-SAT instances as a popular benchmark model in both experimental [34] and theoretical studies [3], [19], [56]. The fitness function is the number of satisfied clauses and each clause contains exactly three literals (negated or non-negated variables from the set 9 More than two optima only exist if there are different combinations of job sizes (beyond symmetries) that add up to the same value. Since the weight of each job size is drawn from a continuous range and the number of values that could lead to equal values is finite, this almost surely never happens. {x 1 , . . . , x n }). In this model, we fix a planted optimum x * and generate clauses independently such that they are satisfied by x * . This means that at least one literal needs to evaluate to true in x * . The variables for each clause are chosen uniformly at random (with or without replacement) from {x 1 , . . . , x n }. We may assume that instances are generated by first deciding which of the three literals will match x * and which will not. In a second step, the indices of variables will be picked. We further assume that there is at least a constant probability c 1 of a clause having one matching literal and at least a constant probability c 3 of a clause having three matching variables. 10 In this setup, x * is a global optimum, but there may be more global optima. We argue that the number of optima is bounded if the number of clauses, m, is chosen large enough.
Consider a solution x with Hamming distance H := H(x, x * ) to x * . We argue that for any clause, the probability that the clause will be satisfied under x is (H/n). If H ≤ n/2 then with probability c 1 we will choose one matching literal and the probability that only the variable of this literal will be chosen among the H ones that differ in x and x * is (H(n−H) 2 /n 3 ) = (H/n). Likewise, if H > n/2 then with probability c 3 we will choose three matching literals and the probability that they are all different in x and x * is (H 3 /n 3 ) = (H/n). Now, since all clauses are generated independently, the probability that all m clauses are satisfied under x is Hence, for all search points x with H ≥ n δ / log 3 n the probability that x is a global optimum is at most exp(− (n δ /(log 3 n)·m/n)) = exp(− (n log n)) if the number of clauses is m = (n 2−δ log 4 n). In this case, the probability that any such search point will be a global optimum is at most 2 n · exp(− (n log n)) = exp(− (n log n)), a failure probability so small that it can be absorbed in the failure probabilities for our tail bounds. Now, with overwhelming probability the number of global optima is bounded by the number of search points with Hamming distance less than n δ / log 3 n from x * . By (8), this number is exp(o(n δ / log n)).
The following theorem summarizes all the above. Theorem 8: Every unary unbiased λ-parallel black-box algorithm A needs more than max λn 60 ln + λ , (1 − δ)n ln n = λn ln + λ + n ln n evaluations, with probability 1 − exp(− (n δ / log n)), to find a global optimum for all of the following settings. 1) All functions with exp(o(n δ / log n)) optima.
2) All functions where all optima have at most n δ / log 3 n ones or at most n δ / log 3 n zeros.
Vertex coloring/Ising model problems: maximizing or minimizing the number of bichromatic edges when trying to color a connected bipartite graph with two colors. 6) MINCUT instances with two equal-sized cliques. 7) PARTITION instances having two symmetric optimal solutions (which almost surely applies to random instances). 8) Monotone polynomials with positive weights where all but o(n δ / log n) variables appear in at least one monomial. 9) Jansen-Zarges nearest peak functions and weighted nearest peak functions with exp(o(n δ / log n)) peaks. 10) Random planted MAX-3-SAT instances as described above with at least m = (n 2−δ log 4 n) clauses. The expected time also satisfies the asymptotic bound.
We remark that results on the expectation are tight for some of these problems: for TWOMAX and the mentioned MINCUT instances, the (1 + λ) EA with adaptive mutation rates and appropriate restart schemes can find global optima in expected O(λn/(ln + λ) + n ln n) fitness evaluations (this easily follows from the analysis on ONEMAX). Other function classes from Theorem 8 contain functions with an exponential black-box complexity, for instance the NEEDLE function. Our results should be regarded as a general baseline that applies to all unary unbiased black-box algorithms and a wide range of problems.

B. Lower Bounds on the Time to Reach Local Optima
For many multimodal problems where the lower bounds from Theorem 8 are not tight, there is another significant application of Theorem 5. It can also be applied to bound the time until any unary unbiased black-box algorithm has found a local optimum, or any search point of reasonably high fitness, if the number of such points is bounded.
This includes functions with exp(o(n δ / log n)) local optima, and those where all local optima have at most n δ / log 3 n ones or at most n δ / log 3 n zeros. The latter function class includes the well-known JUMP k functions [8], [26], where a gap of Hamming distance k has to be "jumped" to reach a global optimum, with parameter k ≤ n δ / log 3 n: here all search points with k zeros are local optima, in addition to the global optimum 1 n . A similar function class CLIFF d was used in [5], [37], and [51], where the same holds for d in lieu of k; the difference between these two functions is that in the region "between" local and global optima JUMP k has a gradient pointing back toward the local optima whereas CLIFF d points toward the global optimum 1 n .
Functions with difficult local optima include a modified version of TWOMAX used in [31]: in TWOMAX := max{ n i=1 x i , n i=1 (1 − x i )} + n i=1 x i the point 1 n is the only global optimum and 0 n is a local optimum that is very hard to escape from. A combinatorial example of a MAXSAT instance with difficult local optima was studied in the context of EAs in [27], with variables x 1 , . . . , x n and clauses Here the optimum is again 1 n , and all n search points with a single 1 bit are local optima. Likewise, the MINCUT instance from Theorem 8 has O(n) local optima as well: all search points with exactly one 1 bit or one 0 bit are locally optimal. Sudholt [55] further presented a hard KNAPSACK instance with (n + 1)/2 "small" objects of weight and value n and (n − 1)/2 "big" objects of weight and value n + 1. The weight limit is set to (n+1)/2·n, such that, including all small objects yields a global optimum, but selecting all but one big object gives a local optimum. Similar as above, the number of local optima is O(n). Finally, the arguments for Jansen-Zarges function classes also hold with respect to the number of local optima.
The following theorem summarizes all the above. Theorem 9: Every unary unbiased λ-parallel black-box algorithm A needs more than max λn 60 ln + λ , (1 − δ)n ln n = λn ln + λ + n ln n evaluations, with probability 1 − exp(− (n δ / log n)), to find a local or global optimum for all of the following settings. 1) All functions with exp(o(n δ / log n)) local optima.
2) All functions where all local optima have at most n δ / log 3 n ones or at most n δ / log 3 n zeros. 3) JUMP k functions with k ≤ n δ / log 3 n. 4) CLIFF d functions with d ≤ n δ / log 3 n. 5) TWOMAX := max{ n i=1 x i , n i=1 (1 − x i )} as well as the modified TWOMAX function TWOMAX : MINCUT instances with two equal-sized cliques. 7) The hard MAXSAT instance from (9). 8) The hard KNAPSACK instance mentioned above. 9) Jansen-Zarges nearest peak functions and weighted nearest peak functions with exp(o(n δ / log n)) peaks. The expected time also satisfies the asymptotic bound.
We can even push our applications a bit further. Again using (8), there are at most exp(o(n δ / log n)) search points within a Hamming ball of radius n δ / log 3 n around any search point. If there are exp(o(n δ / log n)) global or local optima then the number of all search points within the union of Hamming balls around all these points is still exp(o(n δ / log n)) · exp(o(n δ / log n)) = exp(o(n δ / log n)). Hence, our main result from Theorem 5 still applies when considering the time to get to within Hamming distance n δ / log 3 n of any global or local optimum.
Theorem 10: Theorems 8 and 9 still apply when replacing "to find a global optimum" with "to find any search point within Hamming distance n δ / log 3 n to any global optimum" in Theorem 8 and replacing "to find a local or global optimum" with "to find any search point within Hamming distance n δ / log 3 n to any local or global optimum" in Theorem 9.
In particular, this implies that with overwhelming probability no unary unbiased black-box algorithm can find a search point of fitness at least n − n δ / log 3 n for ONEMAX, LO, and TWOMAX within the stated time. In other words, the expected fitness after the stated time is n − n δ / log 3 n + o (1) [where the o(1) term accounts for an exponentially small failure probability, in case of which the fitness could be as large as n]. Such results are known as fixed-budget results [15], [40].
This shows that our λ-parallel black-box complexity results with tail bounds can be applied in a large variety of settings.

VIII. CONCLUSION
We have introduced the parallel uBBC to quantify the limits on the performance of parallel search heuristics, including offspring populations, island models, and multistart methods. We proved that every λ-parallel unbiased black-box algorithm needs at least (λn/(ln + λ) + n ln n) function evaluations on every function with unique optimum, and at least (λn/(ln + (λ/n)) + n 2 ) function evaluations on LO. Corresponding parallel times are by a factor of λ smaller. For LO and ONEMAX we identified the cut-off point for λ, above which the asymptotic number of function evaluations increases, compared to nonparallel algorithms (λ = 1). All smaller λ allow for linear speedups with regard to the parallel time. For ONEMAX this cut-off point is higher than that for the standard (1 + λ) EA; optimal performance for all λ is achieved by a (1 + λ) EA with an adaptive mutation rate.
In a novel and more detailed analysis we have established tail bounds showing that the lower bound (λn/(ln + λ) + n ln n) holds with overwhelming probability, for parallel and nonparallel algorithms (where λ = 1) and for finding any target set of search points we can choose. This makes it a very general, powerful and versatile statement: we obtain lower bounds on the optimization time on functions with many optima, the time to find a local optimum, and the time to even get close to any local or global optimum. We demonstrated the usefulness of this approach by deriving the first black-box complexity lower bounds for a range of popular and illustrative problems, from synthetic problems (TWOMAX, H-IFF, JUMP k , and CLIFF) to classes of multimodal benchmark functions [41] and important problems from combinatorial optimization, such as VERTEX COLORING, MINCUT, PARTITION, KNAPSACK, and MAXSAT.
A major open problem for future work is to derive lower bounds for the λ-parallel uBBC when allowing binary operators like crossover, or operators combining many search points as in EDAs or swarm intelligence algorithms. Currently, even in the nonparallel case no nontrivial lower bounds on the binary uBBC are known.