Parameterized Approximation Schemes for Clustering with General Norm Objectives

This paper considers the well-studied algorithmic regime of designing a $(1+\epsilon)$-approximation algorithm for a k-clustering problem that runs in time $f(k,\epsilon)poly(n)$ (sometimes called an efficient parameterized approximation scheme or EPAS for short1). Notable results of this kind include EPASes in the high-dimensional Euclidean setting for k-center [Badŏiu, Har-Peled, Indyk; STOC’02] as well as k-median, and k-means [Kumar, Sabharwal, Sen; J. ACM 2010]. Our main contribution is a clean and simple EPAS that settles more than ten clustering problems (across multiple well-studied objectives as well as metric spaces) and unifies well-known EPASes. More specifically, our algorithm gives EPASes in the following settings:•Clustering objectives: k-means, k-center, k-median, priority k-center, $\ell$-centrum, ordered k-median, socially fair k-median (aka robust k-median), or any other objective that can be formulated as minimizing a monotone (not necessarily symmetric!) norm of the distances of the points from the solution (generalizing the symmetric formulation introduced by Chakrabarty and Swamy [STOC’19]).•Metric spaces: Continuous high-dimensional Euclidean spaces, metrics of bounded doubling dimension, bounded treewidth metrics, and planar metrics. Prior to our results, EPASes were only known for vanilla clustering objectives (k-means, k-median, and k-center) and each such algorithm is tailored to work for the specific input metric and clustering objective (e.g., EPASes for k means and k-center in $\mathbb{R}^{d}$ are conceptually very different). In contrast, our algorithmic framework is applicable to a wide range of well-studied objective functions in a uniform way, and is (almost) entirely oblivious to any specific metric structures and yet is able to effectively exploit those unknown structures. In particular, our algorithm is not based on the (metric- and objective-specific) technique of coresets. Key to our analysis is a new concept that we call bounded $\epsilon$-scatter dimension—an intrinsic complexity measure of a metric space that is a relaxation of the standard notion of bounded doubling dimension(often used as a source of algorithmic tractability for geometric problems). Our main technical result shows that two conditions are essentially sufficient for our algorithm to yield an EPAS on the input metric M for any clustering objective:(i)The objective is described by a monotone norm, and(ii)the $\epsilon$-scatter dimension of M is upper bounded by a function of $\epsilon$.1Quick remarks: (i) An EPAS is not comparable to polynomial time approximation schemes (PTAS), (ii) before the term EPAS was invented some researchers call this type of approximation schemes a PTAS or simply an approximation scheme (in clustering, it is often assumed that k is small) [1], [2], and (iii) both EPAS and PTAS are implied by the existence of efficient polynomial time approximation schemes (EPTAS).


Introduction
In the class of k-clustering problems, we are interested in partitioning n data points into k subsets called clusters, each of which is represented by a center.We aim at minimizing a certain objective based on the distances between the data points and their respective cluster centers.This is among the most fundamental optimization problems that arise routinely in both theory and practice and has received attention from various research communities, including optimization, data mining, machine learning, and computational geometry.Basic clustering problems such as k-Median, k-Center, and k-Means have been researched for more than half a century and yet remain elusive from many perspectives of computation.
This paper considers a prominent and classic algorithmic regime for k-clustering in which one aims at designing efficient parameterized approximation schemes (EPAS)-a (1 + ) approximation algorithm that runs in time h(k, )poly(n) for every > 0. In a general metric space, obtaining such an approximation scheme is impossible even for basic clustering problems.Past research has therefore focused on designing algorithms that work in structured metric spaces (such as planar graphs or Euclidean spaces).In the continuous high-dimensional Euclidean space, EPASes are arguably the "fastest" approximation scheme one can hope for [Das08,ACKS15], so it is no surprise that research on EPASes for clustering problems has received a lot of attention in the past two decades [HPM04, Mat00, ORSS13, DX20, BJK18, BJKW21, CAPP19]. 2  This paper is inspired by the following meta-question: For a given k-clustering objective and a (structured) metric space, does an EPAS exist?
Systematic understanding about this question has been seriously lacking.While affirmative answers for basic clustering problems such as k-Center, k-Median, and k-Means in the continuous high-dimensional Euclidean space have been shown already two decades ago [BHI02,KSS10] (recently for structured graph metrics [BBH + 20, KLP19, FEKS19, BJKW21, CAPP19]), we do not know of any such result for more complex clustering objectives.
This paper makes substantial progress towards a complete understanding of the above metaquestion.In particular, we present a unified EPAS that works for a broad class of clustering objectives (encompassing almost all center-based clustering objectives ever considered by the algorithms community and some new ones that further generalize the existing problems) as well as diverse metric spaces, hence settling many well-studied standalone clustering problems as a by-product. 3 In contrast to the existing approaches (where each algorithm is tailored to specific input metric and clustering objective), our algorithmic framework is (almost) entirely oblivious to any specific metric structures and the objective function, yet is able to effectively exploit those unknown structures.

Efficient Parameterized Approximation Schemes for Norm k-Clustering
As an input to the (general) k-clustering problem, we are given n data points P , candidate centers F , a metric space M = (P ∪ F, δ), a positive integer k, and an objective function f : R P → R. When a 1 Quick remarks: (i) An EPAS is not comparable to polynomial time approximation schemes (PTAS), (ii) before the term EPAS was invented some researchers call this type of approximation schemes a PTAS or simply an approximation scheme (in clustering, it is often assumed that k is small) [KSS10,FMS07], and (iii) both EPAS and PTAS are implied by the existence of efficient polynomial time approximation schemes (EPTAS).
2 We remark that PTASes, which are incomparable to EPASes, do not exist for continuous k-means, k-median and k-center [CAK19, ACKS15].
3 There are variants of clustering problems that enforce constraints on how points can be assigned to open centers (e.g., capacitated and diversity constraints).Our purpose is handling many center-based clustering objectives; handling a broad range of constraints (such as capacities) is beyond the scope of this paper.set of k "open" centers X ⊆ F is chosen, this solution induces a cost vector δ(P, X) = (δ(p, X)) p∈P where δ(p, X) = min x∈X δ(p, x) represents the distance from point p to the closest center in X.Our goal is to minimize f (δ(P, X)).We call this problem the k-clustering problem with cost function f .We may think of the function f as "aggregating" the costs incurred by the points.For example, we can formulate basic k-clustering objectives via the functions f (x) = p∈P x(p) (k-Median), f (x) = p∈P x(p) 2 (k-Means) and f (x) = max p∈P x(p) (k-Center).
Most natural and well-studied clustering objectives can be modeled using (a generalization of) the concept of norm optimization introduced by Chakrabarty and Swamy [CS19].More specifically, we are interested in the setting where the objective f is a norm.A norm is a function f : R n → R ≥0 , n ∈ N that satisfies (i) for all x ∈ R n , f (x) = 0 if and only if x = 0, (ii) ∀x, y ∈ R n : f (x + y) ≤ f (x) + f (y), and (iii) ∀x ∈ R n , λ ∈ R : f (λx) = |λ|f (x).We say that f is monotone if f (x) ≤ f (y) whenever x ≤ y.By Norm k-Clustering we refer to the k-clustering problem whose objective f : R P → R ≥0 is a monotone norm.While Chakrabarty and Swamy [CS19] further require that f be symmetric4 , our algorithmic framework applies to all monotone norm cost functions.This family includes the following well-known clustering problems (see Figure 2 for an overview): • From k-Means, k-Center, and k-Median to (k, z)-Clustering: All the basic clustering problems can be captured by the z -norm when z ∈ {1, 2, ∞}.In fact, a (k, z)-clustering problem [HV20, CASS21, CALSS22] (for constant positive integer z) uses the objective function g(x) = p∈P |x(p)| z .(This function itself is not a norm, but we can instead consider the z -norm f (x) = g(x) 1/z .) • Weighted k-Center (or Priority k-Center): The weighted version of k-Center [LGW03, BCCN21,Ple87] generalizes the k-Center so that each data point p ∈ P is associated with a positive weight (or priority) w(p), and the objective is to minimize the (weighted) maximum distance to a center. 5This problem can be modelled by the "weighted max" norm f (x) = max p∈P w(p)x(p).One can analogously define the weighted versions of k-Median and k-Means (see, for example, [CGK + 19]).We remark that the underlying weighted norms are not symmetric.
• -Centrum: This problem (sometimes called k-Facility -Centrum) aims to minimize the sum of the connection costs among the "most expensive" points (that is, those that are furthest away from the open centers).The problem generalizes both k-Center ( = 1) and k-Median ( = |P |) problem [Tam01].(See the books [NP05,LNdG15] for more details on -Centrum and the more general Ordered k-Median discussed below.)This problem can be modelled by the top-norm f (x) = j=1 x ↓ (j) where x ↓ denotes the reordering of vector x so that the entries appear non-increasingly.The top-norm is symmetric.
• Ordered k-Median: This problem further generalizes -Centrum, allowing flexible penalties to be applied to data points that incur the highest connection costs.More formally, the objective is the ordered weighted norm f (x) = v x ↓ where v ∈ R n ≥0 is a non-increasing cost vector, that is, v(1) ≥ v(2) ≥ . . .≥ v(n).-Centrum corresponds to v = (1, . . ., 1, 0, . . .0) where the first -entries of v are ones.This problem has already received attention for a few decades [BSS18,CS19,BJKW19].We remark that the f here is a monotone and symmetric norm.
• Socially Fair k-Median (or Robust k-Median): In Socially Fair k-Median, along with the point set P , we are given m different (not necessarily disjoint) subgroups such that P = i∈[m] P i .Our goal is to find a set X of centers that incurs fair costs to the groups by minimizing the maximum cost over all the groups.In other words, Due to distinct applications in at least two domains, this variant of clustering has recently been studied extensively: (i) in algorithmic fairness [ABV21, GJ23, MV21, GSV22] and (ii) in the robust optimization context, this problem is known as Robust k-Median, which intends to capture the applications when we are uncertain about the actual data scenarios (corresponding to the groups P i ) that may come up [AGGN10, BCMN14, BKK + 13].
In particular, one can view the cost function f of Socially Fair k-Median as a "two-level" aggregate cost: First, cost p∈P i δ(p, X) incurred by group P i , i ∈ [m] can be viewed as weighted 1 -norm w i x where w i = 1 P i ∈ {0, 1} P denotes the characteristic vector of P i .Second, these group costs are further aggregated through ∞ , that is, f (x) = max(w 1 x, w 2 x, . . ., w m x).
(z, q)-Fair Clustering allows arbitrary uses of z and q norms to aggregate the costs in two levels.The cost function is defined as f (x) = g(h(x)) where g is any q -norm function and It is easy to check that f (x) = g(h(x)) is a monotone norm whenever g and {h i } are.
• Beyond the Known Problems: Our (asymmetric) norm formulation allows us to model more complex clustering objectives that might be useful in some application settings and, to our knowledge, have not yet been considered in the algorithms community.One such objective is the Priority Ordered k-Median: We have the cost function f (x) = v x w ↓ where the weight vector v ∈ R n ≥0 , and priority vector w ∈ R P ≥0 are given as input, and where x w = (w(p)x(p)) p∈P .This objective generalizes both Priority k-Center and Ordered k-Median.Another natural objective is the (multi-level) Cascaded Norm Clustering, which generalizes (z, q)-Fair Clustering to allow multiple levels of cost aggregation.The cost function f for this problem is described by a directed acyclic graph (DAG) D with one sink node and |P | source nodes (each source corresponds to a point in P ).Each non-source node v is associated with a norm q for some q, and each edge (u, v) has weight w u,v .Given such a DAG D, the value of f (x) can be evaluated by computing the evaluations at nodes in V (D) in (topological) order from sources to sink: (i) The evaluation at source p ∈ P is η(p) = x(p), (ii) For any non-source node v ∈ V (D) labelled with the norm q , we evaluate , and (iii) the value of f (x) is the evaluation of the sink.See Figure 1 for illustration.(z, q)-Fair Clustering is a special case when D has 3 layers with the middle layer using the same norm.Of course, also other basic monotone norms such as top-or ordered weighted norms could be composed to more complex norms analogously.
Figure 1: The DAG here describes evaluation of function f .Node v is labeled with the q norm, so the evaluation at node v is η(v) = (w 1,v x q 1 + w 2,v x q 2 + w 5,v x q 5 ) 1/q .
We remark that asymmetric norms can potentially make the problem substantially harder.For example, a poly-time O(1)-approximation algorithm exists for symmetric norms [CS19] but the asymmetric norm makes it Ω(log n/ log log n)-hard to approximate even for the special case of Robust k-Median on the line metrics [BCMN14].
Our main results are encapsulated in the following theorem.
Theorem 1.1.Let f be an efficiently computable monotone norm cost function.Then the kclustering problem with cost function f admits an EPAS for the following input metrics: (i) metrics of bounded doubling dimension, (ii) continuous Euclidean spaces of any dimension, (iii) bounded treewidth metrics, and (iv) planar metrics.
By continuous Euclidean space, we refer to the setting where any point of the space can be chosen as a center.This is in contrast to a discrete Euclidean space, where we restrict the centers to be selected from a specific finite subset of the points.Observe that for a fixed d, discrete Euclidean problems in R d have bounded doubling dimension, hence covered by our framework.Furthermore, it is not a shortcoming of our result that it does not cover discrete Euclidean spaces of high dimensions: in this setting, k-Center is W[1]-hard to approximate within a factor of 3/2 − o(1) (the proof of this will appear elsewhere).
Our result in particular implies the following.
Prior to our results, the existences of EPASes for all these problems were open (except for k-Means, k-Center, and k-Median).Beyond these known problems, we also obtain EPASes for the new, generalized problems introduced above and depicted in Figure 2. Rather surprisingly, in contrast to the poly-time approximation regime, the complexities of symmetric and asymmetric norm clustering problems "collapse" in the parameterized approximation regime.

Our Conceptual and Technical Contributions
Our main contributions have two parts: (i) a new concept of metric dimension and (ii) our main technical result showing EPASes for all the aforementioned clustering problems.

Unifying Metric Spaces via Scatter Dimension
Our key conceptual contribution is a new notion of bounded metric space dimension that relaxes the standard requirement of bounded doubling dimension so that the metric spaces mentioned in Theorem 1.1 all "live" in a finite dimension.We first explain why existing notions of dimensions are not suitable for such purpose.
There are multiple dimensionality notions that appear in the literature of metric spaces.Most familiar in the algorithmic community is perhaps the doubling dimension (a.k.a.Assouad dimension).Roughly, the doubling dimension of metric (M, δ) is O(d) iff at most ( 1 / ) O(d) balls of radius /2 can be packed into a unit ball (this is called an -packing).Such property can often be computationally leveraged, leading to efficient algorithms for many geometric optimization problems (often with the running time depending exponentially on the dimension).However, the doubling dimension (as well as any other popular notions of dimensions [Cla06]) would not be suitable for us due to the following reasons: (i) The doubling dimension can be as large as Ω(n) in high-dimensional Euclidean space, and (ii) they do not very well "exploit" structured graph metrics, i.e., even stars have unbounded dimension. 7In sum, any algorithms that exploit existing notions of dimensions are unlikely to lead to our desired results.
We introduce the notion of -scatter dimension.Given metric M = (P, F, δ), the sequence (x 1 , p 1 ), . . ., (x , p ) ∈ F × P is said to be an -scattering if, whenever (x, p) appears before (x , p ) in the sequence, then δ(x, p) and δ(x , p ) are larger than 1 + each, while δ(x , p) ≤ 1.The -scatter dimension of M is then defined as the length of the longest scatter, minus one.
There are two natural interpretations.The first interpretation is as a game between two players: The center player who tries to claim she can cover all the points with a unit ball and the point player who present a counterexample.In the first round, the center player picks a center x 1 ∈ F and the point player refutes the claim by presenting a point p 1 ∈ P which is at least a factor 1 + away from the (closed) unit ball around x 1 , that is, p 1 ∈ ball(x 1 , 1 + ).The game continues this way: In the i-th round, the center player presents x i such that {p 1 , . . ., p i−1 } ⊆ ball(x i , 1), and the point player gives p i ∈ ball(x i , 1 + ).Both players are interested in prolonging the game as much as possible.The -scatter dimension is the length of the longest possible game.In the second interpretation, one can view such sequence as a pair of -packings that are required to be sufficiently distanced: It is easy to verify (simply using triangle inequalities) that P * = {p 1 , p 2 , . . ., p −1 } and F * = {x 2 , . . ., x } are -packings of the unit (closed) balls around x and p 1 , respectively.This view immediately implies that -scatter dimension is bounded in a bounded doubling metric.
We proceed to study the -scatter dimension of bounded treewidth graphs.
This proof is based on a (delicate) combinatorial argument that, given graph G, parameter t and an -scattering sequence of length at least doubly exponential in t, produces a "certificate" to the fact that the treewidth of G is greater than t.The proof can be found in Section 6.2.
Next, we present a tool that allows "bootstrapping" of graph classes having bounded -scatter dimension.This is done via a simple connection between -scatter dimension and low-treewidth embedding (an active area of metric space embedding) [FL22,FEKS19,CAFKL20].This connection would allow us to reduce the question of bounding -scatter dimension in a certain graph class to that in bounded treewidth graphs (thereby invoking our Theorem 1.4.)Theorem 1.5 (informal, formal statement in Section 6.3).The -scatter dimension is bounded for any graph class G that admits an η-additive distortion embedding (error ±η∆ where ∆ is the diameter of the graph) into a graph whose treewidth only depends on η.
Such a connection, combined with the embedding result of [FEKS19], implies the following.
Moreover, further progresses in the area of low-treewidth embedding would lead to even wider classes of graphs that have bounded -scatter dimension, e.g., it seems plausible that minor-free graphs admit such an embedding [FL22].
Unfortunately, the bounded dimensionality does not hold in the high-dimensional (continuous) Euclidean metric. 8To handle the high-dimensional continuous Euclidean setting, we present a stronger version of -scatter dimension, that we call algorithmic -scatter dimension.The setting of the game is the same except that the center player would optimize to end the game early, while the point player would be interested in prolonging the game indefinitely.This means, they play against each other.A centering strategy is a function σ : 2 P → F that specifies how the center x i = σ({p 1 , . . ., p i−1 }) would be chosen by the center player, given the points p 1 , . . ., p i−1 played in the preceding rounds.The (σ, )-scatter dimension is the maximum number of rounds when the center player always plays strategy σ, and the algorithmic -scatter dimension is the minimum (σ, )-scatter dimension over all strategies σ.We remark that our actual definition is more involved, as it considers a weighted version of the game.

EPAS for General Norm Clustering: Bypassing Coresets
Now we are ready to explain our main technical result that would allow us to obtain EPAS for all metrics having bounded -scatter dimension.
A generic tool whose existence immediately implies an EPAS is an -coreset -a "compression" of an input instance (P, F, δ) into a much smaller instance so that the cost of any solution is preserved within a factor of (1 ± ).The existence of an -coreset of size depending only on and k would immediately imply an EPAS (but not vice versa): First, use the -coreset to compress the instance (P, F, δ) to (P , F , δ ) where |P | ≤ γ( , k).Then enumerate all possible partitionings of P into k sets P 1 , . . ., P k (there are at most k γ( ,k) such partitions).For each set i ∈ [k], compute the optimal center for P i .We choose the partition that gives the lowest total cost.This generic method, unfortunately, faces a serious information-theoretic limitation, that is, even for k-Center, -coresets of desirable sizes do not exist in high-dimensional Euclidean spaces [FL11].Such lower bounds imply that one cannot hope to prove our (unified) results via the coreset route: While coresets are known for (k, z)-Clustering for constant z [CASS21]-allowing to handle k-Means and k-Median in a uniform fashion-it is impossible to extend this approach to k-Center.For more complex clustering objectives, such EPASes were in fact not known even for low dimension.For example, the coreset of Braverman et al. [BJKW19] for Ordered k-Median in R d has size O ,d (k 2 log 2 n) and therefore does not give an EPAS even in low dimension.
Badoiu, Har-Peled, and Indyk [BHI02] presented an EPAS for k-Center in high-dimensional Euclidean spaces (bypassing coresets in the above sense).Therefore, an obvious open question is whether their techniques can be extended to give an EPAS for any other clustering objective.Unfortunately, this is not even known for simple objectives such as Priority k-Center.In fact, even the known EPASes for k-Means [KSS10] and k-Center [BHI02] are conceptually very different; to our knowledge, no approximation schemes handle k-Means and k-Center in a modular way.
Our main technical result is presented in the following theorem.We remark that our techniques do not rely on any coreset constructions (thus bypassing the coreset lower bounds for k-Center).
Theorem 1.8.Let M be a class of metric spaces that is closed under scaling distances by a positive constant.There is a randomized algorithm that computes for any Norm k-Clustering instance I = (M, f, k) with metric M = (P, F, δ) ∈ M, and any ∈ (0, 1), with high probability a (1 + )approximate solution if the following two conditions are met.
(i) There is an efficient algorithm evaluating for any distance vector x ∈ R P ≥0 the objective f (x) in time T (f ).
(ii) There exists a function λ : R + → R + , such that for all > 0, the algorithmic -scatter dimension of M is at most λ( ).
The running time of the algorithm is exp O kλ( /10) Note that the complexity of computing f appears only as a linear factor in the running time.For instance, for Socially Fair k-Median, the number m of groups affect only the computational cost of f , and therefore the running time is polynomial in m.
Our algorithm is clean, simple, and entirely oblivious to both the objective and the structure of the input metric.
The dependency on k in the exponent of our running time is singly exponential (exp( O (k))).In terms of k, we therefore match the running time of the fastest known EPAS for the highly restrictive special case of high-dimensional k-Means [KSS10].Moreover, the dependency on in the exponent could be improved by proving better bounds on the -scatter dimension of a metric space of interest, e.g., λ( ) = poly(1/ ) implies the EPAS running time exp( O(k) • poly(1/ )).

Overview of Techniques
In this section, we give an informal overview of the technical ideas appearing in the paper.The main result will be built step by step: we believe that it is already interesting to understand our main result specialized to Weighted k-Center and Weighted k-Median.Our starting point is the EPAS of Badȏiu et al. [BHI02] for unweighted k-Center that works on high-dimensional Euclidean spaces.We redesign and change this algorithm in order to be able to present it with a clean division into two parts: a simple branching algorithm and a bound on the abstract concept of (algorithmic) -scatter dimension.This way, we obtain a sharp separation between the branching algorithm, which is specific to the objective and the bound on -scatter dimension, which is specific to the metric.This can be contrasted with techniques based on coresets, which are inherently specific both to a single objective and to a single metric.The main message of the paper is that, with the right combination of additional ideas, this framework can be significantly generalized both in terms of objectives and metric spaces.
This section presents the main algorithmic ideas in three steps.
1.The algorithm for unweighted k-Center can be generalized to Weighted k-Center in a not completely obvious way.
2. Building on the algorithm for Weighted k-Center, we can solve Weighted k-Median with a preprocessing and a random selection step.
3. The Weighted k-Median algorithm can be generalized to arbitrary monotone norms by considering infinitely many Weighted k-Median instances defined by the subgradients.
While some of the challenges on the way may appear to have other approaches promising at first glance, we want to emphasize that it is nontrivial to find the combination of ideas that can be integrated together to obtain our main result.In particular, for Weighted k-Median the initial upper bounds have to be defined carefully in a way that allows, at the same time, an efficient random selection step and generalization to arbitrary monotone norms.
Weighted k-Center with Bounded Number of Different Weights.Our starting point is a simple branching algorithm that is inspired by the EPAS of Badȏiu et al. [BHI02] for unweighted k-Center.Instead of branching, it will be more convenient for us to present it as a randomized algorithm.Furthermore, we consider the more general setting of Weighted k-Center: the objective is to find a set O of k centers that minimizes max p w(p)δ(p, O).Let us first present the algorithm with the simplifying assumption that w is a weight function on the points whose range contains only at most τ different values.The unweighted problem corresponds to w(p) = 1 for every p ∈ P and hence τ = 1.It will be convenient to assume that we (approximately) know the value of OPT.
We start with k arbitrarily chosen candidates X = {x 1 , . . ., x k } for the k centers.We additionally introduce k sets of requests Q 1 , . . ., Q k , where each request is of the form (p, r) with a point p ∈ P and radius r > 0. For every κ ∈ [k], we impose the cluster constraint requiring that, for every (p, r) ∈ Q κ , center x κ should be at distance at most r from p. Initially, we set Q κ = ∅ for every κ, which means that these conditions are trivially satisfied.If we have max p w(p)δ(p, X) > (1+ )OPT, then we can stop, as we have a (1 + )-approximate solution at our hands.Otherwise, we have a point p with δ(p, X) ≥ (1 + )OPT/w(p), while it is at distance at most OPT/w(p) from some center of a hypothetical optimum solution O. Thus the algorithm selects a κ ∈ [k] uniformly at random, hoping it to be the index of the center that is at distance at most OPT/w(p) from p in the optimum solution O. Then we introduce the request (p, OPT/w(p)) into the set Q κ and select x κ to be a center that satisfies the cluster constraint defined by all the requests in the updated Q κ .Observe that if every random choice was compatible with the hypothetical optimum solution O, then the algorithm is always able to find such a center, as the requests in Q κ are always satisfied by the κ-th center of the optimum solution O.
We claim that if the -scatter dimension of the metric is bounded, then this algorithm stops after a bounded number of steps, either by finding an approximate solution or by failing to find a center satisfying the cluster constraints of some κ be the different candidates for the κ-th center throughout this branch.Let (p κ was chosen to be at distance at most r for 1 ≤ i < j, but later was found to be at distance at least (1 + )r As there are at most τ different weights in the input, at least = /τ of these requests have the same radius.That is, there is a subsequence (x ) where every r (s j) κ for j ∈ [ ] is the same value r ≥ 0. This means that we have a subsequence (x 1 , p1 ), . . ., (x , p ) with the property that δ(x i , pi ) > (1 + )r, but δ(x i , pj ) ≤ r for every i < j.By scaling down every distance by a factor of r, this is precisely an -scattering of length .If we consider a class of metrics closed under scaling where the -scatter dimension is λ( ), then this sequence cannot have length longer than λ( ), implying that ≤ τ • λ( ).We can conclude that the algorithm can introduce at most τ • λ( ) requests into each Q κ , hence the algorithm cannot perform more than k • τ • λ( ) iterations.
If every step of the algorithm randomly chooses an index κ ∈ [k] that is consistent with the optimum solution O, then the only way it can stop is by finding an approximate solution.Therefore, the algorithm is successful with probability at least q = k −k•τ •λ( ) .The success probability can be boosted to be a constant arbitrarily close to 1 by the standard technique of repeating the algorithm O(1/q) times, leading to a running time of Weighted k-Center with Arbitrary Weights.We show now how the algorithm can be extended to work in the weighted setting with arbitrary weights.Let us observe first that if there is no bound on the number τ of different weights, then we cannot bound the number of requests to a given Q κ , even in very simple metric spaces such as R 1 .Suppose for example that the requests arriving to Q κ are (p (i) , (1 + 2 ) −i ) for i = 1, 2, . .., where every p (i) is at the origin (or maybe within a very small radius of the origin).Then a center x (i) at (1 + 2 ) 1−i satisfies the first i − 1 requests, but violates the constraint of the i-th by more than a (1 + )-factor.This sequence can be arbitrarily long, and the existence of such a sequence shows that we cannot bound the number of requests arriving to Q κ if we don't have a bound on the number of different weights.Nevertheless, we show that the number of requests can be bounded if we start the algorithm by carefully seeding the initial requests.Let us remark that we know other simple modifications that achieve such a bound, but the technique described below turns out to be the one that can be extended further for Weighted k-Median and general norms.
The main idea is to bootstrap our algorithm with a constant-factor approximation.A simple greedy 3-approximation can be obtained following the ideas of Plesník [Ple87].Let us consider all the balls ball(p, OPT/w(p)) for every p ∈ P .Let us consider these balls in a nondecreasing order of radius, and mark each ball that does not intersect any of the balls marked earlier; let ball(p κ , OPT/w(p κ )), 1 ≤ κ ≤ k be the marked balls.We should have k ≤ k: otherwise, we have more than k pairwise disjoint balls and each of them has to contain a center of the solution, contradicting the assumption that value OPT can be achieved with k centers.For 1 ≤ κ ≤ k , let x i be any center in ball(p κ , OPT/w(p κ )) and let Q κ = {(p κ , OPT/w(p κ )}.For k < κ ≤ k, we choose x i arbitrarily and let Q κ = ∅.Let us observe that with this definition of the Q κ 's, we have δ(p, X) ≤ 3OPT/w(p) during every iteration of our algorithm.Indeed, if the ball of p was marked, then X always contains a center in ball(p κ , OPT/w(p κ )); if the ball of p was unmarked, then it intersects a marked ball with not larger radius that contains a center of X.
The main claim is that the ratio between the radii of two requests appearing in Q κ can be bounded by O(1/ ).Suppose that (p, r) and (p , r ) are two requests in Q κ (introduced in any order) and we have r < r/4.A center of the optimum solution satisfies both request, hence we have δ(p, p ) ≤ r + r .As shown above, at every step of the algorithm there is a center in X at distance at most 3r from p ; let y be such a center at the step when request (p, r) was introduced.Then we have δ(p, y) ≤ δ(p, p ) + δ(p , y) ≤ r + r + 3r ≤ (1 + )r, contradicting the need for the first request.We can use the standard assumption that every weight is of the form (1 + ) i for some integer i: by rounding down every weight to the largest number of this form, we change the objective only by a factor of 1 + .If every weight is of the form (1 + ) i , then the O(1/ ) bound proved above implies that the requests introduced into Q κ for some fixed κ ∈ [k] have O(1/ • log 1/ ) different radii.Therefore, we can bound the total number of requests (and hence the number of iterations) by O(λ( ) • k/ • log 1/ ).This leads to a k O(λ( )•k/ •log 1/ ) • poly(n) time randomized algorithm with constant success probability.
From Weighted k-Center to Weighted k-Median.Towards our goal of understanding general norms, let us consider now the Weighted k-Median problem, where the objective is to find a set O of k centers that minimize p w(p)δ(p, O).We will try to solve this problem by interpreting it as a Weighted k-Center problem on a weighted point set that we dynamically discover during the course of the algorithm.
We would like to turn the linear constraint p w(p)δ(p, X) ≤ OPT of Weighted k-Median into a distance constraint: some point p should be at distance at most r to the solution.Let X be the current solution and suppose that p w(p)δ(p, X) > (1 + )OPT.The intuition is that p w(p)δ(p, X) > (1 + ) p w(p)δ(p, O) for an optimum solution O implies that a nontrivial fraction of the points should satisfy δ(p, X) > (1 + /3)δ(p, O), that is, their distances to the solution has to be improved by more than a factor of 1 + /3.More precisely, an easy averaging argument shows if we select a point p with probability proportional to w(p)δ(p, X), then p satisfies δ(p, X) > (1 + /3)δ(p, O) with probability Ω( ).We call such a point p an /3-witness, certifying that the current solution has to be improved.
Assuming that the sampled point p is indeed a /3-witness, we proceed as in the case of Weighted k-Center.We randomly choose an index κ and introduce the request (p, δ(p, X)/(1 + /3)) into Q κ , to update the cluster constraint by requiring that x κ should be closer to p than in the current solution.If there is a center satisfying all the requests in Q κ , then we update x κ .These steps are repeated until we arrive to a solution P with p w(p)δ(p, X) ≤ (1 + )OPT.
In each step, with probability Ω( /k), the algorithm chooses an /3-witness p and a center κ that is consistent with some hypothetical optimum solution O.However, it is not clear how to bound the running time of the algorithm.It can happen that the requests arriving to Q κ have smaller and smaller radii.As we have seen for Weighted k-Center, in such a scenario we cannot bound the number of steps even in R 1 It is crucial to have some control on the sequence of radii that appear in the requests.Therefore, next we show how to ensure that the radii in the requests to center κ stay within a bounded range.Initial Upper Bounds.For each point p, we compute a weak upper bound u(p) ≥ δ(p, O) on the distance to the optimum solution.Then instead of starting with an arbitrary set of k centers, we bootstrap the algorithm by a solution approximately satisfying all these upper bounds.We argue that this can be done in such a way that ensures that the radii appearing in the requests to each center κ stay within a bounded range.
If a point p * has weight w(p * ), then u(p * ) = OPT/w(p * ) is an obvious upper bound on the distance of p * to O: otherwise, we would have p w(p)δ(p, O) ≥ w(p * )δ(p * , O) > OPT.This bound was sufficient for the Weighted k-Center problem, but the nature of Weighted k-Median allows us to get much stronger upper bounds in many cases.For example, if there are c points of the same weight w roughly at the same position, then each of them should be at distance at most OPT/(wc) from O. Indeed, otherwise the total contribution of these c points to the sum would be greater than OPT.More generally, if there is a radius r such that total weight of the points at distance at most r from p is at least OPT/r, then we claim that p is at most distance 2r from O. Indeed, otherwise all these points would be at distance more than r from O, making their total contribution greater than OPT.Therefore, we can define u(p) = 2r, where r is the smallest radius with the property that the total weight of the points at distance at most r from p is at least OPT/r.Note that u(p) can be determined in polynomial time from the weights of the points and their distance matrix.
Similarly to our Weighted k-Center algorithm, we start with a 3-approximation of the constraints given by the upper bounds u(p) for p ∈ P .Let us go through the points in a nondecreasing order of u(p) and let us greedily choose a maximal independent set of the balls ball(p, u(p)).We should find at most k such balls.Let us choose a center in each ball; it is easy to see that every point p has a selected center at distance at most 3u(p) from it.If center x κ was selected to be a center in ball(p, u(p)), then we initialize Q κ with the request (p, u(p)).This ensures that during every step of the algorithm, it remains true that every point p is at distance at most 3u(p) from the current solution.
We run the algorithm for Weighted k-Median with this initial solution.Before analyzing the algorithm, let us make a nontrivial change in the random selection.We have seen that with probability Ω( /k), we select a random point p and κ ∈ [k] such that δ(p, X) ≥ (1 + /3)δ(p, O) for some optimal solution O.A key claim of the proof is that with probability Ω( /k), it is also true that u(p) ≤ 2kδ(p, X)/ (see Lemma 5.10).Intuitively, the total contribution of the /3-witnesses that are too close to some center x κ ∈ X cannot be very large, because then all of these witnesses would be in a small ball, implying that the upper bound u(p) should be smaller.Note that this is the point in the proof where we crucially utilize the exact definition of u(p).With this claim at hand, we can modify the algorithm such that we are randomly choosing a point p satisfying u(p) ≤ 2kδ(p, X)/ , with probability proportional to w(p)δ(p, X).It remains true that p is an /3-witness with probability Ω( /k).
Let us analyze now the algorithm and bound the number of times a center x κ is updated.We want to argue that the radius in the requests remains in a bounded range.Suppose that we update cluster κ with requests (p, r) and (p , r ) (in either order) such that r 2 r/k.If the algorithm does not fail, then there is a center x κ satisfying both requests.By the triangle inequality, this means that the δ(p, p ) ≤ r+r < r+ r/6.Furthermore, by the constraint u(p ) ≤ 2kδ(p , X)/ = 2k(1+ /3)r / on our selection of the random point p , we have that u(p ) is much smaller than r/18.At every step of the algorithm, the upper bound u(p ) is 3-approximately satisfied by the current solution X. Thus there should be a center in X much closer than r/6 to p .Together with δ(p, p ) < r + r/6, it follows that there is always a center in X at distance at most (1 + /3)r from p, contradicting the need for the request (p, r).
Thus the combination of the two facts that (1) the upper bounds are always satisfied approximately and that (2) the radius in the request is not much smaller than the upper bound implies that the radius in the requests stays within a bounded range.Then we can argue as in the case of the Weighted k-Center problem.If every weight is rounded to a power of (1 + ), then each cluster is given requests with only a bounded number of different radii.If many requests arrive, then there is a long subsequence of the requests with the same radius.This means that the bound on the -scatter dimension can be used to bound the length of this subsequence, and hence the total number of requests to all clusters.
From Weighted k-Median to General Norms Using Subgradients.Next we show how to solve the clustering problem for an arbitrary monotone norm by interpreting it as collection of Weighted k-Median instances that we need to satisfy simultaneously.We will repeatedly solve such Weighted k-Median instances that are dynamically discovered during the course of the algorithm.
It will be convenient to use the notion of subgradients.For our purposes, it is sufficient to discuss subgradients in the context of a monotone norm f : R n → R. We say that g is a subgradient of f at point x if f (x) = g x and f (y) ≥ g y for every y ∈ R n .It is known that every monotone norm has a nonnegative subgradient g ≥ 0 at every point x ≥ 0. Checking whether a vector g is a subgradient at x and finding a subgradient at x can be formulated as convex optimization problems, hence can be (approximately) solved using the ellipsoid method if f can be efficiently computed [GLS12].
Suppose that we have a current solution X and let x ∈ R P ≥0 be the vector representing the distances of the points in P to X. Suppose that X is not (approximately) optimal: f (x) > (1 + )OPT.Let us compute a sugradient g of f at x; we have g x = f (x) > (1 + )OPT and g y ≤ f (y) = OPT for the optimum solution y.That is, g x ≤ OPT is a linear constraint satisfied by the optimum solution and violated by the current solution.Then defining the weights w(p) based on the coordinates of g gives an instance of Weighted k-Median, with p w(p)δ(p, X) > (1+ )OPT for the current solution X. Now we can proceed as above for the Weighted k-Median problem: we randomly choose a point p and cluster κ, introduce a new request into Q κ , find a new center x κ , etc., until we arrive to a solution X with p w(p)δ(p, X) ≤ (1 + )OPT.If this new solution X is still nonoptimal for the original norm problem, that is, f (x) > (1 + )OPT, then we can again compute a subgradient, find a violated linear constraint (possibly the same as in the previous step).We repeat this until we find a solution with f (x) ≤ (1 + )OPT.
Defining the upper bounds and bootstrapping the algorithm with a solution approximately satisfying the upper bounds were crucial for the analysis of the Weighted k-Median algorithm.For general norms, we can again define the upper bounds once we have the weights w based on the violated linear constraint g x ≤ OPT.However, these upper bounds would not be useful for the analysis, as they would depend on the violated linear constraint, hence would change during the algorithm.
Intuitively, we can see the constraint f (x) ≤ OPT as an infinite number of Weighted k-Median instances, corresponding to the linear constraints g x ≤ OPT for every subgradient g of f .We would like to define u(p) to be the smallest possible upper bound that can be assigned to p among all of these infinitely many Weighted k-Median instances.Determining this value seems to be a difficult task, but actually the answer is very simple.Recall that u(p) was defined as twice the smallest r such that ball(p, r) contains total weight at least OPT/r.Thus to define the upper bound u(p), we need to know what the maximum weight of the points in ball(p, r) can be among the infinitely many instances corresponding to all the subgradients.Let b be the characteristic vector of ball(p, r) (i.e., every coordinate is 1 or 0, depending on whether a point is in or not in the ball).Then the question is to determine the maximum of g b among all subgradients g.It is easy to see that this maximum is exactly f (b): if g is a subgradient at b, then g b = f (b); if g is a subgradient at an arbitrary point y, then g b ≤ f (b).Thus we can determine the maximum weight of any ball and define the upper bounds accordingly.With these definitions, the analysis of the Weighted k-Median algorithm go through for general mononote norms.The two main properties of the upper bounds remain valid: (1) the upper bounds are satisfied by the optimum solution and (2) we can restrict our random choice of p to points where the distance to the solution is not much smaller than u(p).
In summary, the final algorithm consists of the following steps (see Figure 3).First we compute the upper bounds u(p) and greedily find a 3-approximate solution satisfying these constraints.Then we repeat the following steps until we reach a solution X for which the distance vector x satisfies f (x) ≤ (1 + )OPT.We compute a subgradient g of f at x to obtain a violated linear constraint g x ≤ OPT.We randomly choose a point p (according to the distribution described above) and require that p be at most distance δ(p, X)/(1 + /3) from the solution, that is, we obtain a violated distance constraint.Then we randomly choose a cluster κ ∈ [k] and require that this distance constraint be satisfied by center x κ .Thus we put the request (p, δ(p, X)/(1 + /3) into Q κ find a new x κ that satisfy the cluster constraints imposed by the requests in Q κ , if possible.We repeat these steps until we arrive to a solution X with distance vector x satisfying f (x) ≤ (1 + )OPT.Our analysis shows that each step is consistent with a hypothetical optimum solution O with probability Ω( /k).Moreover, if -scatter dimension is bounded, then the algorithm has to find a solution or fail after a bounded number of iterations.
(Algorithmic) -Scatter Dimension.After the general algorithm capable of handling any monotone norm objective, our second main contribution is bounding the -scatter dimension of various classes of metrics (Section 6).In the interest of space, we do not go into the details of these (mostly combinatorial) proofs, but give only a brief overview.
• Bounded Doubling Dimension.As outlined in the introduction, the set of points as well as the set of centers in an -scattering both form an -packing of a unit ball implying that any metric of doubling dimension d has -scatter dimension d) .See Theorem 1.3.
• Bounded-Treewidth Graph Metrics.The -scatter dimension bound for metrics defined by the shortest path metric of bounded-treewidth graphs is obtained by a delicate combinatorial proof that exploits both structure of the graph and properties of the -scattering.The bound we obtain is tw 1/ O(tw) for graphs of treewidth tw, that is, double exponential in tw for fixed .It remains is an interesting open question if this bound can be improved.
• Planar Graph Metrics.As outlined in the introduction, we can employ a known metric embedding result to reduce the problem of bounding the -scatter dimension of planar graphs to bounding the -scatter dimension of bounded-treewidth graphs.In particular, the result by Fox-Epstein, Klein, and Schild [FEKS19] provides an (approximate) metric embedding of planar metrics into low-treewidth metrics, which can be used to obtain a 2 2 poly(1/ ) bound on the -scatter dimension of planar graph metrics.• Continuous High-Dimensional Euclidean Space.As mentioned in the introduction, the high-dimensional Euclidean space does not have bounded -scatter dimension.However, in the continuous Euclidean space, where any point of the space can be a center, we can bound the algorithmic -scatter dimension.Towards this, we replace the center player by an algorithmic "player" applying the algorithm by Kumar and Yildirim [KY09] for Weighted 1-Center.
To achieve bounded algorithmic -scatter dimension, this algorithm would require, however, a bounded aspect ratio of the radii in the input requests.We therefore prove an aspect-ratio condition (which holds even for general metrics) implying that it is sufficient for the algorithm to handle instances with aspect-ratio O( 1 / ).We combine this result with the algorithm by Kumar and Yildirim to prove bounded algorithmic -scatter dimension for continuous highdimensional Euclidean space, that is, Theorem 1.7.

Preliminaries
Classes of Metric Clustering Spaces.A metric clustering space (or metric space for brevity) is a triple M = (P, F, δ) where P is a finite set of n data points, F is a (possibly infinite) set of potential locations of cluster centers, and δ is a metric on P ∪F .Sets P and F are not necessarily disjoint.(For example, it is natural for clustering problems to have P = F or P ⊆ F .)Given any point u ∈ P ∪ F in the metric space and a radius r ∈ R + , we denote by ball δ (u, r) A class M of metric spaces is a (infinite) set of metric spaces.This paper focuses on metric classes that are closed under scaling distances by a constant.We consider the following classes of metric clustering spaces: • Graph Metric: In the case of graph metric, we are given a (weighted) graph G = (V, E) and the metric δ G on V as the shortest path metric, i.e., δ G (u, v) is the shortest distance of a path connecting u and v.The clustering space (P, F, δ G ) is given such that P, F ⊆ V .
• Continuous Euclidean Spaces: In this case, we are allowed to choose centers from the (high-dimensional) continuous Euclidean space F = R d .The set P R d is a finite set of points.
• Doubling Metric: The doubling dimension of a metric space (X, δ), denoted as d, is the smallest m > 0 such that every ball of radius r in the metric can be covered by 2 m balls of radius r 2 .Note that a d-dimensional Euclidean metric has doubling dimension O(d).
Treewidth.A tree decomposition of a graph G is a pair (T, β) where T is a tree, β : V (T ) → 2 V (G) , V (T ) and V (G) denote the vertices of the tree T and G respectively, with the following properties.
The width of the tree decomposition (T, β) is max t∈V (T ) |β(t)| − 1.The treewidth of a graph G is the minimum width over all tree decompositions of G. Subgradients of Norms.We state definitions and summarize basic facts about subgradients of norms that we will use throughout the paper.Fact 3.1.Any norm is a convex function.

Definition 3.2 (Subgradient).
A subgradient of a convex function f : R n → R at any point x ∈ R n is any g ∈ R n such that the following holds for every y ∈ R n f (y) ≥ f (x) + g (y − x); we denote by ∂f (x) the set of subgradients of f at x.
The following fact summarizes various useful properties of subgradients specialized to norm functions.Because we are apply norm objectives exclusively to non-negative distance vectors, we call (slightly abusing terminology) a restriction of a norm to R n ≥0 a norm as well.

Fact 3.3 ([CS19]
).Let f : R n ≥0 → R ≥0 be a norm and x ∈ R n ≥0 .If g is a subgradient of f at x, then f (x) = g x and f (y) ≥ g y for all y ∈ R n ≥0 .Further, if f is monotone, there exists a subgradient g ∈ ∂f (x) such that g ≥ 0.
The following observation is an immediate consequence of Fact 3.3.Observation 3.4.Let ∂f = y∈R n ≥0 ∂f (y) be the set of all subgradients of f .Then for any x ∈ R n ≥0 , we have that f (x) = max g∈∂f g x .
Definition 3.5 ( -Approximate Subgradient).Let f : R n ≥0 → R ≥0 be a norm and let > 0. We define the set ∂ f (x) of -approximate subgradients of f at x to contain all g ∈ R n ≥0 such that the following two conditions hold (i) f (y) ≥ g y for each y ∈ R n ≥0 , and It is known that approximate subgradients of convex functions can be computed efficiently via an (approximate) value oracle for the function through reductions shown by Grötschel, Lovasz and Schrijver in their classic book [GLS12].While the reduction in [GLS12] appears to take at least Ω(n 10 ) calls to the oracle, there exist faster methods assuming additional properties of the convex function, for example, see [LV06,LSV18].Specifically for p norms, closed formulas describing the sets of subgradients are known and used in practice.Some Terminology and Notation.Let M = (P, F, δ) be a clustering space on n = |P | data points.Let b ∈ R P ≥0 be an n-dimensional vector.We interpret b as assigning each point p ∈ P a non-negative value denoted b(p).That is, b = (b(p)) p∈P .For example, given a subset X ⊆ F of centers, we define the distance vector δ(P, X) = (δ(p, X)) p∈P .If B ⊆ P is a subset of points then 1 B ∈ {0, 1} P denotes the characteristic vector of B, that is, it assigns value 1 to any b ∈ B and 0 to any p ∈ P \ B. If p ∈ P and α ≥ 0 then we denote by 1 p,α the binary vector 1 ball(p,α)∩P .

-Scatter Dimension
In this section, we introduce the concept of -scatter dimension formally, which plays a central role in our algorithmic framework.The following definition is a formalization of the "center-point game" presented in the introduction.Definition 4.1 ( -Scatter Dimension).We are given a class M of finite metric spaces, a space M = (P, F, δ) in M, and some ∈ (0, 1).An -scattering in M is a sequence (x 1 , p 1 ) . . ., (x , p ) of center-point pairs The -scatter dimension of M is the maximum length of an -scattering in it.The -scatter dimension of M is the supremum of the -scatter dimension over all M ∈ M.
Algorithmic -Scatter Dimension.The definition of algorithmic -scatter dimension is based on the notion of (C M , )-scattering, which is a variant of -scattering: Centers are chosen via an (approximate) Ball Intersection algorithm C M rather than by an adversarial center-player.Intuitively, we maintain a dynamic instance of Ball Intersection that is augmented by adding distance constraints (p, r) one by one.In the context of (C M , )-scattering, we call the distance constraints (p, r) requests, which are satisfied by the Ball Intersection algorithm sequentially.
Definition 4.2 (Algorithmic -Scatter Dimension).Let M be a class of metric spaces with Ball Intersection algorithm C M , let M = (P, F, δ) be a metric in M, and let ∈ (0, 1) Moreover, let p i ∈ P , x i ∈ F , and r i ∈ R + for each i ∈ [ ] where is a positive integer.The sequence (x 1 , p 1 , r 1 ), . . ., (x , p , r ) is called an (C M , )-scattering if the following two conditions hold.
(i) We have (There is no requirement regarding the first center x 1 in the sequence.) We say that M has algorithmic ( , C M )-scatter dimension λ M ( ) if any (C M , )-scattering contains at most λ M ( ) many triples with the same radius value.The algorithmic -scatter dimension of M is the minimum algorithmic ( , C M )-scatter dimension over any Ball Intersection algorithm C M for M.
When the family M is clear from the context we drop the subscript M from λ M ( ) and C M .Note that, in contrast to the -scatter dimension, for algorithmic -scatter dimension we demand that the number of triples per radius value be bounded rather than the total length of the sequence.In fact, this stronger requirement would not hold for high-dimensional Euclidean spaces whereas the weaker (algorithmic) requirement turns out to be sufficient for our results.Another noteworthy difference is that a subsequence of an (C M , )-scattering is not necessarily a (C M , )-scattering itself because it may not be consistent with the behavior of algorithm C M .
Relation Between Algorithmic and non-Algorithmic -Scatter Dimension.The following lemma shows that the algorithmic -scatter dimension indeed generalizes the -scatter dimension for finite metric spaces.
Proof.Let M = (P, F, δ) be a metric space in the given class along with a set Q of distance constraints.Our Ball Intersection algorithm exhaustively searches F to find a center x satisfying all distance constraints exactly.If no such point exists the algorithm fails.Let C denote this algorithm.Consider any (C M , )-scattering.Notice that any sub-sequence of triples with the same radius value forms an -scattering.Hence the sequence contains at most λ( ) many triples for any radius value.
Aspect-Ratio Lemma for Algorithmic -Scatter Dimension.The following is a handy consequence of bounded algorithmic -scatter dimension that we use in proving our result.It strengthens the properties of an (C M , )-scattering by bounding the number of triples whose radii lie in an interval of bounded aspect-ratio (rather than bounding the number of triples with the same radius value).
Let A M be an Ball Intersection algorithm such that the algorithmic ( , A M )-scatter dimension is λ( ).Let η ∈ (0, 1) be the input error parameter.Consider the Ball Intersection algorithm C M that works as follows.For any of the input requests (p, r) we round r to r , which is the smallest power of 1 + η /50 larger than r.We then invoke A M on the rounded requests with error parameter η/2 and output the center returned by A M .Clearly, this algorithm is an (1 + η)-approximate Ball Intersection algorithm (for the original requests).

Framework for Efficient Parameterized Approximation Schemes
Main Result.We are now ready to state our main result.In the remainder of this section, we prove the following theorem, restated from the introduction.In Section 5.1, we describe the EPAS and give some intuition.In Section 5.2, we give a full, technical analysis.
Theorem 1.8.Let M be a class of metric spaces that is closed under scaling distances by a positive constant.There is a randomized algorithm that computes for any Norm k-Clustering instance I = (M, f, k) with metric M = (P, F, δ) ∈ M, and any ∈ (0, 1), with high probability a (1 + )approximate solution if the following two conditions are met.
(i) There is an efficient algorithm evaluating for any distance vector x ∈ R P ≥0 the objective f (x) in time T (f ).
(ii) There exists a function λ : R + → R + , such that for all > 0, the algorithmic -scatter dimension of M is at most λ( ).

Algorithm
Our algorithm is stated formally in Algorithm 1.We informally summarize the key steps of our algorithm, which we also outlined partially in the technical overview.We also give some intuition of the analysis.Using standard enumeration techniques, we assume that we know (a sufficiently exact approximation of) the optimum objective function value OPT.Our goal is to satisfy the convex constraint f (x) ≤ (1 + )OPT imposed on the distance vector x ∈ R P ≥0 (which represents the distance vector δ(P, X) induced by the feasible solution X ⊆ F ).By Observation 3.4, this constraint is equivalent to (infinitely many) linear constraints w x ≤ (1 + )OPT where w ∈ ∂f is any subgradient of f .
To illustrate the main idea, we first describe a highly simplified, but failed attempt.We consider in each iteration of the while-loop (lines 8-15) a candidate solution X.If f (x) ≤ (1 + )OPT, then we are done.Otherwise, we compute an ( /10-approximate) subgradient w of f at x in line 9. Since w x = f (x) > (1 + )OPT, this constitutes a violated linear constraint.Consider sampling a point p ∈ P with probability proportional to its contribution w(p)δ(p, X) to the objective f (x) = w x (line 11).An averaging argument shows that with probability Ω( ), the sampled point p satisfies δ(p, X) > (1+ /3)δ(p, O) for some fixed hypothetical optimum solution O.In this event, we identified a violated distance constraint, and call p an /3-witness for X.We assign p to a cluster κ ∈ [k] picked uniformly at random, which equals the correct cluster of p in O with probability 1 /k.Assuming that both events occur, this allows us to add the request (p, r) with radius value r = δ(p, X)/(1 + /3) to the cluster constraint Q κ imposed on the cluster with index κ.(See lines 13 and 14.) Here, we refer to the set Q κ of requests for cluster κ as cluster constraint of κ.
κ ), . . ., (p κ ) be the sequence of requests added to the cluster constraint associated with cluster κ.Let x (i) κ , i ∈ [ ] be the center of cluster κ just before adding the request (p The key observation is that the sequence of triples (x κ ) forms an algorithmic -scattering.We would like to argue that the length of this sequence is bounded because the algorithmic -scatter dimension is bounded.Unfortunately, the scatter dimension bounds only the number of triples per radius value but not the overall length of the sequence.
To address this issue, we compute in line 1 an initial upper bound u(p) on the radius of any point p ∈ P .We (approximately) satisfy these initial distance constraints for all points in a greedy pre-processing step (see lines 2-7).We maintain the distance constraints during the main phase by adding them as initial requests (see line 5).The upper bound u(p) is a rough estimate of the smallest radius r that may be imposed on p as part of any request (p, r).We modify the sampling process in the main phase (see line 11) to sample only from a subset of points whose distance to X is not much smaller than their initial upper bound u(p).We show via a careful argument that every request (p, r) we make is consistent with O with probability Ω( /k).We argue, moreover, that all radii of requests made for a particular cluster are within a factor O( k / 2 ) of each other.The initial upper bounds are computed by detecting "dense" balls (line 1) in the input instance in the sense that they would receive high weight by some subgradient of the objective norm and would therefore require that any near-optimal solution must place a center in the vicinity of that dense ball.

Bounding the Number of Iterations
In this subsection, we prove Lemma 5.2.The proof consists in three steps.First, we argue that the initial upper distance bounds u(p) that we compute for each point p ∈ P are (i) consistent with any optimum solution (Lemma 5.4), and (ii) approximately satisfied throughout the algorithm (Lemma 5.5).Second, we establish that the radii in the requests made for any particular cluster are within a bounded factor (aspect ratio) of each other (Lemma 5.6).The third step consists in proving that, for any particular cluster, the sequence of requests along with the corresponding centers constitute an algorithmic (C M , )-scattering of bounded aspect ratio.Hence we can use Lemma 4.4 to bound the length of the sequence and thus the number of iterations by a function of k and , thereby completing the proof of Lemma 5.2.Initial Upper Bounds.We first show that the initial upper bounds we calculate in the algorithm are conservative in the sense that they are also respected by an optimal solution.Lemma 5.4.If O is an optimal solution then δ(p, O) ≤ u(p) for any p ∈ P , where u(p) is the initial upper bound computed in line 1 of Algorithm 1.
Proof.Let α = u(p).For the sake of a contradiction, assume that δ(p, O) > α.By triangle inequality, any point p ∈ ball(p, α/3) has distance at least 2α/3 to O. Hence we have δ The following lemma says that throughout the algorithm we approximately satisfy all upper bounds.We remark that the initialization (lines 2-7) as well as the analysis is a variant of Plesník's algorithm [Ple87] for Priority k-Center when applied to point set P with radii u(p), p ∈ P .
Lemma 5.5.The number k of points marked in line 3 in Algorithm 1 is at most k.Moreover, at any time during the execution of the while loop (lines 8-15), we have that δ(p, X) ≤ 4u(p).For any request (p, r) added to some cluster constraint, we have r ≤ 4u(p).
Proof.By Lemma 5.4 each of the balls ball(p (κ) , u(p (κ) )) with marked p (κ) , κ ∈ [k ] contains at least one point from some hypothetical optimum solution O. On the other hand, these balls are pairwise disjoint by construction.Hence k ≤ |O| ≤ k.This also implies that the algorithm can initialize X = (x 1 , . . ., x k ) in line 7 with centers satisfying all initial cluster constraints.For example, it may pick the k centers in F closest to p (κ) , κ ∈ [k ] and k − k many additional arbitrary centers.
Because these initial requests are never removed, they are passed to the Ball Intersection algorithm (with error parameter /10; see line 14) whenever we make a change in the respective cluster.Hence, we have δ(p, X) ≤ (1 + /10)u(p) ≤ 3u(p)/2 for any marked point p throughout the execution of the while loop.For any point p not marked, ball(p , u(p )) intersects ball(p, u(p)) for some marked p.Because the points are processed in line 3 in non-decreasing order of u(•), we must have u(p) ≤ u(p ).As argued before, ball(p, 3u(p)/2) is guaranteed to contain a center in X at any time during the while loop.This center has distance at most u(p ) + 2 • 3u(p)/2 ≤ 4u(p ) from p by triangle inequality.For the second claim, notice that r < δ(p, X) ≤ 4u(p) at the time this request is processed in line 14 for the first time.
Bounding the Aspect-Ratio of Requests.The following lemma establishes that the radii of any two requests made for the same cluster are within a factor O( k / 2 ) from each other.The intuition is as follows.We ensure in the algorithm (see line 10) that we only sample points whose radii are within a factor O( k / ) from u(p).Assume that the radii, and thus the initial bounds u(p), u(p ), in two request (p, r), (p , r ) to the same cluster were very far from each other, say r r and u(p ) u(p).This would then imply that p was already (essentially) within radius r from some center before requesting (p, r) since there must be a center within radius 4u(p ) r/3 from p by Lemma 5.5.This contradicts the assumption that we requested (p, r) in the first place.
Lemma 5.6.Let (p, r) and (p , r ) be requests added (in either order) to the same cluster constraint Q κ , κ ∈ [k] in line 13 of Algorithm 1.If r ≤ 2 • r/(10 4 k) then the algorithm fails in line 14 upon making the second of the two requests.
Leveraging Bounded Algorithmic -Scatter Dimension.To complete the proof of Lemma 5.2, we fix some cluster and consider the sequence of triples (x, p, r) where (p, r) is a request made for this cluster and where x is the center of the cluster just before the request was made.We establish that this sequence constitutes an algorithmic (C M , )-scattering and use Lemma 5.6 to bound the aspect ratio of the radii in this sequence by O( k / 2 ).We complete the proof via the aspect-ratio lemma 4.4.
κ ), . . ., ( ) be the sequence of requests in the order in which they are added to Q κ in line 13.For any i ∈ [ ], let x (i) κ be the center of cluster κ at the time just before requesting (p κ ), . . ., (p )} and error parameter /10, the sequence (x for every i ∈ [ ] where r min denotes the smallest radius in any request for cluster κ.Applying Lemma 4.4 to the interval R κ , the length of the sequence is O((log k / )λ( /10)/ ).Since our algorithm adds in each iteration one request to some cluster constraint, the overall number of iterations is O(k(log k / )λ( /10)/ ).

Bounding the Success Probability
The proof of Lemma 5.3 consists of two key steps: First, we argue that the algorithm terminates with success (that is, without failure) if the random choices made by the algorithm are "consistent" (to be defined more precisely below) with some hypothetical optimum solution.Second, we argue that consistency is maintained with sufficiently high probability in each iteration.Together with our upper bound on the number of iterations from Lemma 5.2, this completes the proof of the main result, Theorem 1.8.

Consistency.
Informally speaking, we mean by consistency that (i) the points in the requests are assigned to the correct (optimal) cluster and (ii) the radius r in any request (p, r) is justified, that is, not smaller than the distance of p to its optimal center.Definition 5.7.Consider a fixed hypothetical optimum solution O = (o 1 , . . ., o k ).We say that the current state of execution (specified by (X, If the current state is consistent with the optimum solution O, then O certifies existence of solution to the cluster constraints (Q 1 , . . ., Q k ) currently imposed.Therefore, the following observation is straightforward.
Observation 5.8.If the state of the algorithm is consistent with O before executing line 14 in any iteration, then the algorithm does not fail during this iteration.
Probability of Maintaining Consistency.If the state of execution is consistent with O at the beginning of some iteration, then it remains consistent under the following two conditions.First, the point p sampled in this iteration is (randomly) assigned to the correct cluster.Second, the distance of p to the current candidate solution is sufficiently larger than its distance to O, thereby justifying the request made in line 13.This second condition motivates the following definition.Definition 5.9.Given a candidate solution X with f (δ(P, X)) > (1 + )OPT, a point p ∈ P is called an -witness if δ(p, X) > (1 + )δ(p, O).
The following lemma implies that the request made in any iteration for the sampled point is justified with probability Ω( ).It is a key part of our analysis as it links the specific way of (i) computing the initial upper bounds and (ii) sampling a witness based on these upper bounds.It is ultimately this interplay that allows us to bound the aspect ratio of the radii in the requests for a particular cluster and therefore the overall number of requests per cluster in terms of k and .
Lemma 5.10.Consider a fixed iteration of the while loop of Algorithm 1 and let X be the candidate solution at the beginning of this iteration.The point sampled in line 11 is then an /3-witness for X with probability Ω( ).In particular, the set A computed in line 10 is not empty.
Proof.For any subset S ⊆ P of points let C S = p∈S w(p)δ(p, X) denote the contribution of S towards w δ(P, X) = C P .
Let W ⊆ P be the subset of /3-witnesses of X.We claim that the contribution C W is at least C P /10.Suppose for the sake of a contradiction that their contribution is less.Then, using which contradicts f (δ(P, X)) > (1 + )OPT.
Let W 1 , . . ., W k denote the subsets of the witnesses closest to centers x 1 , . . ., x k in X, respectively.
Let H ⊆ [k] be the subset of clusters κ ∈ [k] such that C Wκ ≥ C P /(100k).Fix any cluster κ ∈ H. Let {z 1 , . . ., z } be the witnesses in W κ in non-decreasing order by the distance δ(z i , x κ ), i ∈ [ ] to their closest cluster center x κ .Let j ∈ [ ] be the minimum index j such that the contribution of the set W − κ = {z 1 , . . ., z j } is at least C Wκ /2.This implies that also C W + κ ≥ C Wκ /2 where W + κ = {z j , . . ., z }.Hence C W − κ and C W + κ are both at least C P /(200k) because κ ∈ H.We claim that W + κ ⊆ A where A is defined as in line 10 in Algorithm 1. Towards this, let p ∈ W + κ be arbitrary.We prove that u(p) ≤ 1000kδ(p, x κ )/ and hence p ∈ A. To see this, notice that ball(p, 2δ(p, x κ )) ⊇ ball(x κ , δ(p, x κ )) ⊇ W − κ .On the other hand, Setting α = 6δ(p, x κ ), this implies that Since we sample a point p from A with probability proportional to its contribution C {p} , we sample a witness in each iteration with probability at least /40.Notice that C P ≥ f (δ(P, X))/2 > 0. The left-hand side of Equation 1 must therefore be positive.This implies that A is not empty.
Overall Success Probability.We are now ready to prove Lemma 5.3, thereby completing the proof of the main theorem 1.8.We establish that the state of execution is consistent before entering the while loop in Algorithm 1.The proof is completed by combining the upper bound on the number of iterations (Lemma 5.2) with the lower bound on the probability of maintaining consistence (Lemma 5.10).
Proof of Lemma 5.3.Let p (1) , . . ., p (k ) be the points marked in line 3 of Algorithm 1.By Lemma 5.4, each ball(p (κ) , u(p (κ) )), κ ∈ [k ] contains a point from O. By construction, these balls are moreover pairwise disjoint.Hence, by relabeling the optimum centers O = (o 1 , . . ., o k ), we can assume that δ(p (κ) , o κ ) ≤ u(p (κ) ) for each marked point p where κ ∈ [k ] is the index of the cluster.Therefore the state of execution of the algorithm is consistent with O just before the first execution of the while loop (lines 8-15).Assume now that the state is consistent with O at the beginning of an iteration of the while loop.By Lemma 5.10, we sample an /3-witness p in this iteration with probability Ω( ).In this event, the request (p, r) added has radius r = δ(p, X)/(1 + /3) ≥ δ(p, O).If additionally the cluster index κ ∈ [k] picked at random is the same as the one in O-which happens with probability Ω( 1 /k)-then the state remains consistent with O.In this event, the recomputation of the center in line 14 does not fail.By Lemma 5.2, the algorithm terminates after at most O k(log k / )λ( /10) many iterations.Since in any iteration it does not fail with probability Ω( /k), it succeeds overall with probability exp − O kλ( /10) .

-Scatter Dimension Bounds
This section is devoted to bounding the -scatter dimension in various classes of metrics, proving Theorems 1.3, 1.4, and 1.6 from the Introduction.

Bounded Doubling Dimension
In this section, we show the upper bound of the -scatter dimension of any metric space of doubling dimension d, proving Theorem 1.3.
Scatter Dimension and Packing.Given metric (X, δ), an -packing of this metric is a subset of points X ⊆ X such that δ(i, j) ≥ for all i, j ∈ X .This is a standard notion in the theory of metric spaces.We first observe the following connection between our -scattering and -packing.Observation 6.1.Let (x 1 , p 1 ), . . ., (x , p ) be an -scattering in a metric space (P, F, δ).Then, the centers X = {x 2 , . . ., x } is an -packing in metric (P ∪ F, δ) and X is contained in a unit ball.
It is a well-known fact that -packing of any metric of doubling dimension d has size at most O((1/ ) d ).Combining this with Observation 6.1 yields Theorem 1.3.Remark: We note that the converse of Corollary 6.2 is false even in a very simple graph metric such as a star.In an n-node star rooted at r, a unit ball ball(r, 1) includes the whole graph.There exists an -packing of size (n − 1) by choosing the non-root nodes.However, any -scattering has length at most 2.

Bounded Treewidth Graphs
In this section we show that any graph of treewidth tw has -scatter dimension tw (1/ ) O(tw) .That is, we prove Theorem 1.4 for the bounded treewidth graph metric.We later show that the bound for planar graphs can be derived via an embedding result of [FRS19].For convenience, we abbreviate ball δ G (r, γ) by ball G (r, γ).

Treewidth and Spiders
Our proof relies on the notion of spiders, whose existence can serve as a "witness" to the fact that the treewidth of a graph G is high.Given an edge-weighted graph G, X ⊆ V (G) and γ ∈ (0, 1), a γ-spider on X is a set S = ball G (r, γ) for some r ∈ V (G) such that there are |X| paths from S to X that are vertex-disjoint except for in S. We say that a set S is a spider on X if it is a γ-spider for some γ.See Figure 4 for illustration.
Observe that if S is a γ-spider on X, then for any X ⊆ X, S is also a γ-spider on X .The following lemma is key to our result, roughly showing that the existence of a large number of spiders implies that the treewidth of G is large.Lemma 6.3.Let G be a graph, k be an integer and X ⊆ V (G) : |X| > 3k.If there is a family S of k + 1 pairwise disjoint spiders on X, then the treewidth of G is larger than k.
Proof.Assume otherwise that the treewidth is at most k.Then, there exists a balanced separator We claim that each spider S ∈ S must contain a vertex in the separator, i.e., S ∩ A = ∅.For the sake of contradiction, say there exists S ∈ S such that S ⊆ V i for some i ∈ {1, 2}.Without loss of generality, let S ⊆ V 1 .Recall that S is a spider on X. Hence there are |X| many internally-vertex disjoint paths from S to distinct vertices of X.Since |V 2 ∩ X| ≥ |X|/3 > k, there are at least k + 1 separator, all these vertex-disjoint paths pass through A. Thus, |A| is at least the number of these paths (k + 1), which is a contradiction.We conclude that each spider S ∈ S intersects A.
Since S is a family of pairwise vertex-disjoint spiders, we conclude that |A| ≥ k + 1, a contradiction.

Iteratively Finding Spiders
Our main result in this section is encapsulated in the following theorem.Theorem 6.4.If there is an -scattering of length at least (O(k/ )) (4/ ) k+1 in G, then graph G contains a family of k + 1 disjoint spiders on vertex set of size greater than 3k.
Combining the above with Lemma 6.3, we can deduce that the length of any -scattering is at most tw (1/ ) O(tw) as desired.We spend the rest of this section proving the theorem.Givenscattering σ, we say that the -packing X = X(σ), given by Observation 6.1, is a canonical packing of σ.Lemma 6.5.Let σ be an -scattering of length in G ⊆ ball G (r, 1) and X = X(σ) its canonical -packing.Then, there exist Before proving this lemma, we show how it implies Theorem 6.4.Let G 0 = G contain ascattering σ 0 of length at least 0 = ( k c 0 ) (4/ ) k+1 and X 0 = X(σ 0 ).The lemma allows us to find a spider S 1 on X 1 of size c 0 • /3 0 ≥ ( k c 0 ) (4/ ) k = 1 for sufficiently small .Moreover, we have the graph G 1 that is disjoint with S 1 and -scattering that is a subsequence σ 1 of length 1 .Since (G 1 , X 1 , σ 1 ) satisfies the preconditions of Lemma 6.5, we can apply it to obtain (G 2 , X 2 , σ 2 ) and so on.More formally, starting from (G i , σ i , X i ), we apply Lemma 6.5 to obtain (G i+1 , σ i+1 , X i+1 ).We maintain the following invariant: The length of the sequence σ i satisfies i = |X i | ≥ ( k c 0 ) (4/ ) k+1−i .This allows us to find disjoint spiders S 1 , S 2 , . . ., S k+1 on X k+1 : |X k+1 | > 3k as desired.

Proof of Lemma 6.5
Let G be contained in the unit ball ball(r, 1).The proof has two steps.In the first step, we find a spider S on a subset X ⊆ X of relatively large size.In the second step, we show the graph G obtained by removing S from G still contains a large subsequence σ of σ whose canonical packing is a subset X of X that has desired cardinality.
First step: Let T be a shortest path tree from r to X (recall that |X| = ), so vertices in X appear at the leafs of this tree.We construct an "auxiliary" tree T on subset V ⊆ V (T ) from T inductively as follows.Let B r = ball T (r, /3).Remove B r from T to obtain subtrees T 1 , . . ., T q with roots r 1 , . . ., r q .For each i ∈ [q], let X i ⊆ X be the descendants of r i in T i that are in X.Since vertices in X are at the leaf, we have that X i = ∅.We inductively perform this process on the instances (T 1 , X q ), . . ., (T q , X q ) to obtain the auxiliary subtrees T i for (T i , X i ).Now create T by connecting r to r 1 , . . ., r q in (making them direct children of r).See Figure 5.For each v ∈ V ( T ), denote by T v the subtree of T rooted at v and B v the ball ball Tv (v, /3) constructed by the recursive procedure.Observe that v∈V ( T ) B v ⊇ X and that the depth of T is at most (3/ ) (since δ(r, x) ≤ 1 for all x ∈ X and each recursion reduces the root-to-leaf distance by /3.) Claim 6.6.There must be a vertex r ∈ V ( T ) such that r has at least D = ( /2) /3 children in T .
Proof.Assume that the number of children is less than D for every vertex in T .Then the total number of vertices in T is less than 2D 3/ .For each such vertex v ∈ V ( T ), we have |B v ∩X| ≤ 1 (since X is an -packing while the diameter of B v is at most 2 /3).Therefore, = |X| ≤ v |B v ∩ X| ≤ 2D 3/ .This would imply that D ≥ ( /2) /3 .Let v be the node in V closest to the root in T such that there are at least D children (breaking ties arbitrarily).This means that (in the process of creating T ) removing B v = ball Tv (v, /3) gives us at least D subtrees T 1 , . . ., T D where each such tree contains (arbitrarily chosen) x i ∈ X i as a descendant in T .Notice that S is a spider on X = {x 1 , x 2 , . . ., x D }.
Second step: Let σ be the subsequence of σ whose canonical -packing is X , that is, X(σ ) = X .Recall that |X | ≥ ( /2) /3 .Denote the spider S by S = ball G (s, /3).In this second step, we show that G = G \ S still contains a long -scattering σ which is a subsequence of σ such that X = X(σ ) ⊆ X has the desirable length.
Notice that the refutation properties hold for these pairs after removing S, i.e., δ (x, p), δ (x , p ), δ (x, p ) > (1 + ) (the distances cannot decrease after removing vertices from a graph).It suffices then to show that δ (p, x ) ≤ 1.To this end, we argue that any shortest path from p to x in G cannot intersect with the ball S. Assume otherwise that there exists a shortest path Q from p to x in G that intersects with S at some vertex v ∈ S ∩ Q.Notice that δ(p, x ) = δ(p, v) + δ(v, x ).We will reach a contradiction by showing that δ(p, x) ≤ (1 + ).Since δ(p, x) ≤ δ(p, v) + δ(v, x), by Claim 6.8, this is at most δ(p, v)+δ(v, x )+ = δ(p, x )+ , which would imply that δ(p, x) ≤ (1+ ), contradicting to the refutation property.

Bounding -Scatter Dimension via Low-Treewidth Embedding
In this section, we show a (simple) connection between bounding -scatter dimension and an active research area on embedding with additive distortion [FEKS19,FL22,CAFKL20].This connection allows us to upper bound the -scatter dimension of planar graphs.
In particular, we say that (weighted) graph class G admits a t-low treewidth-diameter embedding for function t : N → N if there exists a deterministic algorithm that takes G and produces weighted graph H of treewidth at most t(η) and an embedding φ : V (G) → V (H) such that: where D is the diameter of G. Theorem 6.9.Let λ tw ( ) denote the the -scatter dimension of graphs of treewidth tw (from the previous section, this bound is at most doubly exponential in tw).If graph class G admits a t-low treewidth-diameter embedding, then every metric in G has -scatter dimension at most λ t( /10) ( /3).

High-Dimensional Euclidean Space
Recall, from the introduction and Sections 4, that the -scatter dimension of high-dimensional (continuous) Euclidean space is unbounded.In this section, we show, however, that the algorithmic -scatter dimension of this metric is bounded.
We dedicate the rest of this section to the proof of Theorem 1.7.In order to upper bound the algorithmic -scatter dimension for the continuous Euclidean space, it suffices to show that there exists an algorithm C such that the (C, )-scattering dimension in the Euclidean space is bounded.We use an algorithm by Kumar and Yildirim [KY09] as Ball Intersection algorithm for the high-dimensional Euclidean space.They study the Ball Intersection problem in the language of Weighted Euclidean 1-Center.They provide a Ball Intersection algorithm based on a convex optimization formulation which efficiently (and approximately) solves the Ball Intersection problem in continuous Euclidean setting for weights with bounded aspect ratio.Let C KY denote this algorithm.The following lemma is adapted from Kumar and Yildirim's work into our terminology (see Lemma 4.2 of [KY09]).Lemma 6.12.Given an instance (P, F, δ) of Ball Intersection in high-dimensional Euclidean space, associated radii r(p) to each p ∈ P , and ∈ (0, 1), the length of any (C KY , )-scattering is at most O ( τ / 2 ) where τ ≥ 1 is the squared ratio of the largest radius in the requests to the smallest.
Note that for a constant τ , Lemma 6.12 yields the proof of the theorem.To complete the proof, we show that by increasing the length of the -scattering by a multiplicative factor of O (log 1 / ), we can assume that τ is O ( 1 / 2 ).
Aspect-Ratio Condition.The following lemma provides a sufficient condition for bounded algorithmic -scatter dimension that facilitates the design of a Ball Intersection algorithm for bounding the algorithmic -scatter dimension.In particular, this condition is key to bound the algorithmic -scatter dimension of high-dimensional continuous Euclidean spaces.It can be seen as a strenghtened converse of the aspect-ratio lemma 5.6.Lemma 6.13 (Aspect-Ratio Condition).Let M be a class of metric spaces with Ball Intersection algorithm C M and let ∈ (0, 1).If any (C M , )-scattering (x 1 , p 1 , r 1 ), . . ., (x , p , r ) with r i ∈ [ /12, 1], i ∈ [ ] contains at most λ( ) triples with the same radius, then the algorithmic -scatter dimension of M bounded by O (λ( ) log 1 / ).
To prove Lemma 6.13 , we assume that we are given a Ball Intersection algorithm C M as stated.We claim that the following Ball Intersection algorithm, which invokes C M as a sub-routine, yields algorithmic -scatter dimension O(λ( ) log 1 / ) according to the condition of Definition 4.2.
Let Q , ρ, x be defined as in Algorithm 2. Consider any (p i , r i ) ∈ Q.We distinguish two cases.First assume that (p i , r i ) ∈ Q .Assuming that C M is a correct Ball Intersection algorithm, be assigned to open centers in X, e.g., capacity [ABM + 19, DL16, Coh20], different notions of fairness [BCFN19, CKLV17, CFLM19] and diversity constraints [LYZ10, TOG21, TGOO22]; in such case, our framework does not apply.Extending our framework to handle such constraints (or proving that EPASes do not exist when such constraints are enforced) is an interesting direction.

Figure 2 :
Figure 2: Selected clustering objectives that can be formulated as monotone norm minimization.The line illustrates generalization (bottom is a special case of top).

Figure 3 :
Figure 3: Overall structure of the main algorithm.
r } the ball of radius r centered around u.We drop the subscript δ if the distance function is clear from the context.By |M | we denote the space needed to represent the metric space M in the memory.If M is finite then |M | is polynomial in |F |, |P | and the space needed for storing a point and a center, respectively.If F is infinite (for example, in the continuous Euclidean setting, F = R d ), |M | is polynomial in |P | and the space of storing a point.

Figure 4 :
Figure 4: A spider S = ball G (r, γ) on X. Paths connecting X to r are disjoint, except for nodes in S.

Figure 5 :
Figure 5: A recursive construction of tree T .

Figure 6 :
Figure 6: The partition of X into {X i } based on their distance from the spider S. Rectangular points are the points in X .