Dimension-reduction of dynamics on real-world networks with symmetry

We derive explicit formulae to quantify the Markov chain state-space compression, or lumping, that can be achieved in a broad range of dynamical processes on real-world networks, including models of epidemics and voting behaviour, by exploiting redundancies due to symmetries. These formulae are applied in a large-scale study of such symmetry-induced lumping in real-world networks, from which we identify specific networks for which lumping enables exact analysis that could not have been done on the full state-space. For most networks, lumping gives a state-space compression ratio of up to 107, but the largest compression ratio identified is nearly 1012. Many of the highest compression ratios occur in animal social networks. We also present examples of types of symmetry found in real-world networks that have not been previously reported.


Introduction
A wide range of phenomena can be modelled as dynamical processes on networks [1], including epidemic spreading [2,3], opinion dynamics [4][5][6], the diffusion of innovations [7][8][9][10], the evolution of languages [11][12][13] and cultural polarization [14,15]. Mathematical models of such processes can be formulated as Markov chains [16,17], where the future evolution is determined by the current state, but to analyse such models it is often necessary to resort to low-dimensional approximations [18][19][20]. The standard approach is to make use of mean-field approximations, which at the simplest level ignore network topology completely, i.e. the system is assumed to be well mixed [20]. More complicated approximations that incorporate network topology include pair approximations [21], degreebased and heterogeneous mean-field [22], moment closures [2] and approximate master equations [23,24]. Such approximations are typically based on intuitive probabilistic reasoning rather than rigorous mathematics, so it is generally difficult to quantify how well a given approximation will do, given the network or dynamical process [25].
In contrast to approximate mean-field theories, there are exact studies of full Markov chain dynamics on small networks [26][27][28][29][30] where it is possible to store the full state-space in computer memory. Constructing the full Markov chain gives access to the complete timeevolving probability distribution over state-space, which can be used to derive detailed statistical information [17], including quantification of stochastic variation (unlike mean-field theories that only provide averages), to conduct sensitivity analysis [31,32] and to consider more complicated models of specific scenarios [33] that include agent heterogeneity [29,30]. Furthermore, the full probability distribution can be used in Bayesian methods, e.g. for parameter estimation and model selection.
Exact analysis of larger networks can be achieved by lumping states together to reduce the state-space size [34] using network symmetries [2,16,17,35,36]. Network symmetries result in redundancies that can be exploited in many applications, for example to reduce network size via quotients [37], or to perform efficient simulations [38]. Symmetries are intimately connected to the spectral peaks of the adjacency and Laplacian matrices [39][40][41], and hence they impact on a wide range of network 'structural measures' [42] and dynamical phenomena. In particular, network symmetries facilitate cluster synchronization [43,44] and can be used to control group consensus [45]. Symmetry in complex networks has received increasing interest recently, in part because it has been shown that many real-world networks have a significant amount of symmetry [46][47][48]. This is unexpected since large networks chosen at random are typically asymmetric [49]. The most common symmetries found in real-world networks are of particular types associated with 'basic symmetric motifs' [41], which includes subgraphs made up of leaves, cliques and bicliques [46].
In § §2 and 3, we set up the mathematics and derive explicit formulae, (2.3)-(3.3), for the size of the reduced state-space that results from symmetry-induced lumping of Markov chain dynamics on real-world networks, a problem that is extremely difficult for networks and graphs in general [50]. Practitioners need only consider figure 1 and equations (2.3)-(3.3) before moving straight to the summary of the mathematical results in §4. In §5, we use the formulae to analyse over 1500 real-world networks with 100 vertices or less, obtained from the website www.networkrepository. com [51]. We find that over 80% of the networks analysed have non-trivial symmetry, and the symmetry in 94% of these cases is entirely due to leaves, cliques, bicliques and repeated isomorphic components. We highlight other types of symmetry that are more complex than those previously reported in real-world networks and show that regardless of the computer being used, specific networks can always be identified whose full state-space cannot be stored in memory but whose lumped state-space can. Section 6 includes a detailed discussion of potential applications, limitations and open problems.

Network dynamics and symmetry
We focus on dynamical processes on finite networks described by Markov chains in which each vertex can be in one of a finite number of vertex-states and only one vertex can change vertex-state at any instant in time. We refer to such processes as single-vertex transition (SVT) models [16,17]. An example of an SVT model is the SIR model of epidemics [2,19], where the network captures social interactions and the vertex-states are susceptible, infected and recovered. Let V denote the set of vertices and W the set of vertex-states, then the state-space of an SVT model is the set of all functions from V to W, denoted S = W V . Thus the vertex-state of vertex v ∈ V in state s ∈ S is s(v) ∈ W. The number of states in state-space is M N , where M is the number of vertex-states and N is the number of vertices. While the number of states increases exponentially with the number of vertices, it is finite so we can enumerate states in state-space, i.e. S = {s 1 ,  over states in S, then the evolution of X(t) is given by the forward Kolmogorov or master equation [52] where Q is an M N × M N matrix called the infinitesimal generator whose ijth component describes the transition rate from the state s i to the state s j for i = j, and whose diagonal entries ensure the rows sum to zero (i.e. the magnitude of Q ii is the transition rate out of state s i ). The transition rates in SVT models only depend on the vertex-states of nearest neighbours [16,17]. In this paper, we will focus on the structure of the network and state-space and we will not need the infinitesimal generator directly. A finite Markov chain is lumpable if there is a partition L = {L 1 , L 2 , . . . , L r } of state-space on which the Markov property is preserved [34] and it has been shown that network symmetries can be used to lump SVT models [16,17,35]. Network symmetries, or graph automorphisms, are permutations of the vertices that leave the edge set unchanged. More precisely, a symmetry of a network G with vertices V and edges We use the shorthand gv = g(v). The set of symmetries of a network G form a permutation group G = Aut(G) called the automorphism group of G [53], which we refer to as the symmetry group of the network. It has been shown that the symmetries in typical real-world complex networks can be decomposed into a product of subgroups that act independently of one another [47]. Mathematically, this means the symmetry group G of a network can be written as a direct product The right-hand side of (2.1) is known as the geometric decomposition of G and each H i as a geometric factor of G. Each geometric factor H i is associated with a distinct subset of vertices V i , so that all pairs of subsets V i and V j , i = j, are disjoint. We refer to the set of vertices V i as a geometric component and the induced subgraph on V i as a symmetric motif (SM The symmetry group G of a network can also act on states in state-space in such a way that vertex-states are permuted rather than vertices. More precisely if u ∈ V, g ∈ G and s ∈ S then (gs)(u) = s(g −1 u), (2.2) i.e. the vertex-state of u in gs is the same as the vertex-state of g −1 u in s. This action defines an equivalence relation on states that partitions state-space into disjoint sets of states, called orbits [54], and this partition, denoted S/G, is a lumping of S [16,17,36]. In appendix A, we prove that there is a decomposition of state-space that reflects the geometric decomposition of the symmetry group. This means that we can focus on the states of each geometric component rather than the network as a whole. The states of a geometric component are S i = W V i and we refer to each S i as a state-space factor. Thus if we can compute the orbit partition of each statespace factor then we can combine these together to determine the orbit partition of the whole network. It follows from the results in appendix A that if ρ i is the number of orbits of the ith state-space factor, then the number of orbits of S is Thus we can compute the size of the lumped state-space, ρ, and the compression ratio, M N /ρ, from the lumping of each geometric component.

Orbit representatives of typical symmetric motifs
The symmetries of typical real-world networks are restricted to only a few different types, examples of which are illustrated in figure 1. We refer to the SM illustrated in figure 1a as SM (a) and similarly for the other SMs in the figure. We refer to SMs (a-f) collectively as typical SMs and any other type of SM as atypical. Typical SMs include: (a) leaves, (b) cliques, (c) 'n-stars', (d) mirror symmetries and (e) regular trees. Repeated isomorphic components consisting of a single SM of type (a-e), an example of which is illustrated in figure 1f, are also common in the data that we analyse so we include this as a type of typical SM. For each of these typical SMs, we now describe how to construct a set of orbit representatives, i.e. a sub-set of states such that each is in a distinct state-space orbit, from which we can determine the number of orbits of the corresponding statespace factor, ρ i , and construct the lumped Markov chain. The construction of orbit representatives can be broken down into three distinct categories of typical SMs, namely basic symmetic motifs (BSMs), height-regular trees and isomorphic components of typical SMs. These are addressed in the corresponding sections below. In the following, we make repeated use of the number of combinations of n items chosen from k possibilities (an n-combination) with repetition, which is given by C(n + k − 1, n) [55], where C(n, k) is the binomial coefficient n choose k. For context, C(n + k − 1, n) is the number of non-negative integer solutions to the linear Diophantine equation [56] and hence it is also the number of states in a stochastic compartmental model with k compartments and population size n.

(a) Basic symmetric motifs
BSMs have k ≥ 1 orbits each with n vertices that are permuted simultaneously by the symmetry group S n , so that the vertices in pairs of orbits are either connected in a one-to-one or one-to-allbut-one fashion [41,47]. This includes SMs (a-d) in figure 1 and the set of orbit representatives can be constructed in essentially the same way for each of these. Assign a vertex from each of the k orbits to the set U , then the set of possible states for the vertices in U is W U . A set of orbit representatives can be constructed by selecting all possible combinations of n states chosen with repetition from the M k possibilities in W U . Thus the number of orbits of a BSM is For clarity, we now relate the general construction of orbit representatives to each of the BSMs in turn. The symmetry groups of SMs (a) and (b) are both the symmetric group S n acting naturally on n vertices (i.e. all possible permutations of the n vertices), so k = 1 and both SMs (a) and (b) correspond to the case n = 3. There is a permutation in S n that maps a state to any other with the same number of vertices in each vertex-state. Thus the set of all possible combinations of n vertexstates chosen with repetition from the M possibilities in W forms a set of orbit representatives. The symmetry group of SM (c) is the symmetric group S n having k orbits on vertices, where each orbit has n vertices. SM (c) corresponds to the case k = 2 and n = 3. The symmetry group of SM (d) is the cyclic group of order two, C 2 , which is isomorphic to S 2 , hence n = 2 and (3.1) reduces to M k (M k + 1)/2.

(b) Height-regular trees
The symmetry group of SM (e) is S η 1 S η 2 , where denotes the wreath product [57]. This SM consists of η 2 stars, each having η 1 leaves, and the central vertex of each star is connected to a fixed vertex. We call this a height-regular tree of height two. SM (e) is the case where η 1 = η 2 = 2. The wreath product in the symmetry group S η 1 S η 2 captures the fact that the η 1 leaves of each of the stars can be permuted according to the symmetric group S η 1 , but also that the η 2 stars can be permuted according to S η 2 . Thus S η 1 S η 2 consists of (η 1 !) η 2 η 2 ! permutations. A set of orbit representatives can be identified via a recursive procedure described in detail in appendix B. In short, this procedure computes the orbit representatives for one of the stars, and then η 2 of these are chosen with repetition to form the set of orbit representatives for SM (e). Thus the number of orbits of this SM is

(c) Isomorphic components of typical symmetric motifs
Suppose that there are n isomorphic components and that H is the symmetry group of any one of the components in isolation (note that H could be the trivial group), then the symmetry group of the n isomorphic components together is H S n . This captures the fact that it is possible to permute any of the vertices in an isomorphic component according to H independently of the other components, and one can also permute the n components according to S n . SM (f) is the case n = 2 and H ∼ = C 2 . Let R∼ = denote a set of orbit representatives for a single component, then a set of orbit representatives can be determined for the collection of isomorphic components by choosing n states from R∼ = with repetition. Thus if R = |R∼ = | then the number of orbits of this SM is We consider symmetry due to repeated isomorphic components as typical if each repeated component only has symmetry due to SMs (a-e), in which case we can determine R∼ = using the methods described for BSMs and height-regular trees.

Summary of mathematical results
The mathematical results that we have presented use group theory and combinatorics, however a practitioner need only make use of the formulae (2.3)-(3.3) that we have derived. Figure 2 illustrates how these results may be used in practice. Given a network, the first step is to compute the network symmetries and identify typical SM geometric components (see [47] for computational tools). Once these have been identified, the formulae (2.3)-(3.3) can be applied directly to the corresponding SMs to compute the total number of orbit representatives and the corresponding states. Figure 2 illustrates this for a network with two fixed vertices and SMs (a-c) and (f).

Application to real-world network data
We now apply the methods of computing orbit representatives described in the previous section to real-world network data. In table 1 in appendix C, we present information about 15 large real-world networks. The key observation is that the number of vertices fixed by the symmetry group, N fixed , is large in all cases, and the lumped state-space will be larger than M N fixed when N fixed < N. Thus the size of the lumped state-space of these networks remains far beyond practical computation.
We focus instead on 1524 real-world networks with 100 vertices or less obtained from the website networkrepository.com [51]. We downloaded all networks from the repository having N ≤ 100 vertices on three occasions during 2018 and 2019, resulting in a total of 1524 networks. For each network, we removed all isolated nodes and self-edges, and made the networks undirected and unweighted. The format of the repository has changed since our first download and naturally new graphs get added over time, thus we have made the processed network data available online [58]. We used the program saucy [59] to compute generators of the symmetry groups of the networks and processed the symmetry groups using the computational algebra program GAP [60]. We found that 1227 of the 1524 networks, more than 80%, have non-trivial symmetry. This supports the conclusions from previous studies [47,48] that most real-world networks have symmetry.
Our dataset contains 12 different types of networks according to the classification on networkrepository.com, including 'social', 'brain' and 'protein', but the majority of networks (79%) were either chemical or animal social networks. The animal social networks can also be found on the 'Animal Social Network Repository' 1 [61]. Most of the networks have between 10 and 50 vertices, with an average of 38 vertices. Of the 1227 networks with non-trivial symmetry, 1151 (nearly 94%) have symmetry entirely due to SMs (a)-(f), of which roughly 93% have SMs of types (a) and (b), 25% have type (d) and 27% have type (e); only four networks had type (c). The average fraction of vertices moved in networks with non-trivial symmetry is 0.4. More detailed network statistics can be found in appendix D.    We now turn to the computation of the number of orbit representatives, ρ, for the case where the number of vertex-states is M = 2, which corresponds to most SVT models [16,17] and includes the SIS model of epidemics [2] and the voter model [5] of opinion dynamics. In figure 3, we plot (N, ρ) for each network with non-trivial symmetry, where the colour indicates the type of network. The ρ-axis is scaled logarithmically and the light grey region indicates the area of possible values of (N, ρ), the largest value of ρ being 2 N , corresponding to no symmetry, and the smallest being N + 1, corresponding to a complete graph. On a particular computer there will be a limit to the size of the state-space that can be used, which we denote by τ , and we call cases where ρ ≤ τ feasible and we call feasible cases where 2 N > τ significant. The dark grey horizontal line in figure 3 corresponds to the feasible threshold τ = 10 9 , which is indicative of the size of state-space that can be stored in memory on a typical laptop computer at the time of writing. 2 The vertical dark grey line indicates the number of vertices that corresponds to the threshold τ = 10 9 , which is roughly N = 30. At the feasible threshold level of τ = 10 9 , there are a total of 62 networks with significant lumping. To quantify the amount of lumping, we introduce the relative significance = N log 10 (2) − log 10 (ρ), which measures the logarithmic reduction in the size of state-space. Thus 10 is the compression ratio. The dark grey dashed line in figure 3 corresponds to = 7, i.e. the full state-space is 10 7 times larger than the lumped state-space. Under exponentially increasing computer power and memory (e.g. Moore's Law), we would expect the threshold τ to increase at a constant rate and so can be thought of as a proxy for how long a particular network would be classed as having significant lumping. Moreover, any network with non-trivial symmetry will at some point in time (possibly in the past) have significant lumping. Crucially, figure 3 shows that for any value of τ within the limits of our data, we can find specific examples of networks that have significant lumping.
In figure 4, we plot examples of some of the networks in our dataset and these are labelled (a-i) in correspondence to those labels in figure 3. In each network, fixed vertices are coloured light grey and other vertices of the same colour are in the same vertex orbit (i.e. there is a permutation that takes a vertex to any other in the same orbit). The majority of networks with significant lumping for τ = 10 9 are animal social networks [61] and networks (a) and (b) exemplify the sorts of structures that give rise to this significant lumping. Network (a) corresponds to the animal social network 'mammalia-voles-bhp-trapping-63' whose large amount of symmetry comes from repeated isomorphic components, including pairs, triangles and paths of length two. Network (b) is the animal social network 'mammalia-bat-roosting-indiana' whose symmetry is due to cliques. Network (c) corresponds to the animal social network 'mammalia-voles-rob-trapping-51' and has the largest relative significance of = 11.93, i.e. the full state-space of this network is nearly 10 12 times as large as the lumped state-space. Many of the other examples of networks with high are from animal social networks with symmetry due to isomorphic components, but networks (d) and (e) are two other examples with relatively high . Network (d) is the brain network 'bnmacaque-rhesus_brain_2' [62][63][64], which has several instances of S n symmetry due to bicliques and has = 5.90. Network (e) is the retweet network 'rt-retweet' [65,66] whose symmetry is due to leaves and has = 3.50. This is perhaps to be expected in retweet networks, which capture the spread of information, and so are likely to be tree-like with few short cycles.
We now give some examples of networks with significantly more complex symmetric structures than those previously reported [47]. There are 54 real-world networks with atypical SMs, of which 41 have geometric components with a 'mirror' symmetry, i.e. the geometric component can be partitioned into a pair of isomorphic motifs with symmetric connections to one or more fixed vertices, and in some cases with connections between the mirror symmetric motifs. The geometric factor for such components is of the form H C 2 , where we have found that H is either C 2 , S 3 or C 2 × C 2 , examples of which can be seen in networks (f ) and (g), the animal social network 'mammalia-voles-bhp-trapping-47' and the chemical network 'ENZYMES_g186', respectively. There are also 15 networks with other types of complex symmetry. Network (h) is the animal social network 'reptilia-tortoise-network-pv-2010', in which the geometric factor with blue vertices has D 8 symmetry with three vertex orbits (D 8 symmetry on one vertex orbit also arises via a four-cycle in other networks). Network (i) is the network 'cage4' [67] and is derived from a model of DNA electrophoresis. This network has C 2 × C 2 symmetry, in which the blue vertices can be permuted independently of the green vertices, but the yellow vertices are moved by all permutations.

Discussion
In this paper, we have shown how network symmetries can be used to reduce the size of the state-space of dynamical processes on real-world networks described by Markov chains. This approach makes use of the special structure of the symmetries present in real-world networks, which allowed us to obtain explicit formulae for the size of the reduction, something that is extremely difficult to compute for networks in general [50]. We applied this method to more than 1500 real-world networks that have 100 vertices or less, we found 62 networks with significant lumping and illustrated examples of more complex types of real-world network symmetry than previously reported.
We have observed that the most significant lumping arises in animal social networks (although these are also the most prevalent type of network in our data). Dynamical processes on animal social networks have been used to model the spread of diseases [68,69] and parasites [70] using epidemic models, and the spread of social information via diffusion [71]. There are also network dynamics models of evolutionary dynamics [72][73][74], social evolution, co-evolution, population stability, dispersal and invasion [75]. Studies that use observational data typically involve relatively small population sizes and Monte Carlo simulations [69]. Thus such studies could benefit from the lumping techniques discussed in this paper, particularly the use of exact distributions for Bayesian statistics.
We stress that there are limitations to the use of exact lumping, in particular it is not a technique that can be applied directly to typical large networks and when it can be used, our study suggests that it may only allow one to consider an additional 20 nodes at most. However, there is real value in studying small networks exactly, where these gains are significant, particularly   for modelling hospital-and healthcare-acquired infections [31,76], and the effects of peripatetic healthcare workers [77]. We also note that often the most significant lumping arises in networks that have multiple isomorphic components. In such cases, using the lumped state-space to directly compute the evolution of the full probability distribution may not be the most efficient approach, particularly if only basic summary statistics are required, like the mean and variance of the number of infected individuals, since the independence of each component can be exploited. However, the lumping approach described facilitates the computation of more detailed statistics, for example the probability of a given number of infected individuals, and can be used to construct efficient algorithms. This work suggests a number of unsolved mathematical problems. Firstly, what sort of networks or graphs give rise to highly significant lumping? It is easy to show from Pólya enumeration [57] that a lower bound on the number of state-space orbits for a network with N vertices and symmetry group G is M N /|G|. Thus clearly networks with significant lumping must have very large symmetry groups. We conjecture that the next smallest lumping after the complete graph is the star graph, but how does this sequence continue? Note that the hierarchy of graph symmetry [53], namely vertex-transitive, arc-transitive and distance-transitive, does not necessarily correspond to significant lumping. For example, cycle graphs are vertex-transitive and the symmetry group of an N cycle is the dihedral group of order 2N. For these graphs, it can be shown that the number of state-space orbits is asymptotic to M N /2N for large N, i.e. the lumped state-space is a similar size to the unlumped state-space. We have found that this is also qualitatively true for other vertex-transitive graphs.
For certain families of graph, it is possible to determine formulae for the number of statespace orbits [35,36]. Moreover, using the results for SM (e), we can construct networks for which the number of state-space orbits can scale like a polynomial in the number of nodes N with any degree m, by having m isomorphic cliques of size N/m. Related asymptotic results might shed light on the structure of the (N, ρ) space, e.g. the sparsity of 'realizable' pairs. In addition to complete graphs, star graphs and multiple isomorphic components, other families of graph that have relatively small numbers of state-space orbits include bipartite graphs, multipartite graphs and lexicographic products of complete graphs [78]. The symmetry groups of these graphs consist of direct and wreath products of symmetric groups-similar to what we have observed in realworld networks. This suggests that the symmetries of real-world networks are exactly the right sorts of symmetry to give rise to significant lumping.
We might also consider whether there are computationally efficient algorithms to compute the number of state-space orbits and a set of orbit representatives for an arbitrary network. We found that a naive application of Pólya enumeration in the computational algebra package GAP [60] was limited by the size of the group, so it was not possible to compute the number of state-space orbits for precisely those networks that have significant lumping. While computing the number of state-space orbits may be a very difficult problem in general, recent advances in algorithms to compute generators for symmetry groups of graphs are encouraging [59]. The development of general algorithms to compute orbit representatives may benefit from consideration of how do to this theoretically, for example by making use of methods from permutation group theory. When analysing a permutation group, one can reduce the scale of the problem through the connections between intransitive, imprimitive and primitive groups, and it may be possible to use this approach to construct a set of orbit representatives.
So far we have also focused on reductions in state-space with no loss of information. However, one can also derive reduced dimensional Markov chains by lumping states together in a nonexact way, but where the transition rate between lumped states is optimal in some sense. Recent attempts to do this have made use of 'local symmetries' [36] and reductions to 'Markov Population Models' [79]. We have also focused on dynamics on networks that are Markovian, but non-Markovian models are important when modelling epidemics on networks, since the distribution of recovery times of real diseases are not necessarily exponential [80]. There is a significant body of work on approximations of non-Markovian epidemic dynamics on networks [81][82][83][84][85] and there are also non-Markovian models of temporal networks [86] and infectious diseases on temporal networks [80]. We expect that network symmetries will also be relevant to non-Markovian models since network symmetry is a consequence of invariance under relabelling of vertices. Alternative notions of symmetry such as 'stochastic invariance' [87] and isospectral 'latent symmetries' [88] may also be useful when modelling dynamics on networks.
In summary, we have derived explicit formulae for the number of state-space orbits of typical symmetric motifs and used these to study a large number of real-world networks, finding many examples with significant state-space compression. This is a remarkable property of real-world networks and is due to the special types of symmetry present.

Appendix A. State-space decomposition
The support of an automorphism g is the set of vertices permuted by g, denoted supp(g) = {v ∈ V | gv = v}, and similarly the support of the automorphism group of a network is the union of the supports of its automorphisms. We say that two automorphisms g and h are support disjoint if their respective supports are disjoint. Similarly, if H 1 and H 2 are subgroups of G, then we say that H 1 and H 2 are support disjoint if all pairs of automorphisms h 1 ∈ H 1 and h 2 ∈ H 2 are support disjoint. Thus the geometric decomposition (2.1) is a direct product of support disjoint subgroups.
We can decompose state-space in a way that reflects the geometric decomposition of the automorphism group, allowing us to focus on the state-space associated with each individual geometric component. Let V 0 be the set of vertices fixed by G, and we refer to each S i as a state-space factor. Mirroring the geometric decomposition of G into a direct product of geometric factors in (2.1), we can similarly decompose S into a Cartesian product of state-space factors, The automorphism group G then acts on P in the natural way. Specifically, for every g ∈ G, we can write g = (h 0 , h 1 , . . . , h m ), where h i ∈ H i and H 0 is the trivial group consisting of just the identity permutation (since the vertices in V 0 are fixed by every g ∈ G). Then for g ∈ G and p = (p 0 , p 1 , . . . , p m ) ∈ P, we define the action of G on P to be where the action of h i on p i is as in (2.2). We now show that there is an equivalence between the action of G on the full-state space S and on the Cartesian product decomposition P. The notion of equivalence we need comes from permutation group theory [89].
Definition A.1. Let X and Y be finite sets and let G be a permutation group with an action defined on X and an action defined on Y. We say that the actions of G on X and Y are equivalent if there is a bijection f : X → Y that satisfies for all g ∈ G and x ∈ X. Proof. Let f : S → P such that for s ∈ S, (s 0 , s 1 , . . . , s m ), and f is one-to-one. For g ∈ G, we can write g = (h 0 , h 1 , . . . , h m ) and so It follows that there is a one-to-one correspondence between the orbits of S and the orbits of P. Moreover, each orbit in P corresponds to the Cartesian product of an orbit from each of the state-space factors.
Thus the orbit of s ∈ S corresponds to the Cartesian product of an orbit from each of the statespace factors.
Suppose x ∈ C and f (x) = (x 0 , x 1 , . . . , x m ). Thus there is g ∈ G such that x = gs and consequently f (x) = gf (s). From the geometric decomposition of G, we can write g = (h 0 , h 1 , . . . , h m ) and so Since gs ∈ C we have that C ⊂ F(C) and consequently F(C) = C .

Appendix B. Orbit representatives of height-regular trees
We now give a detailed description of how a set of orbit representatives can be constructed for height-regular trees via a recursive procedure that makes use of the results for SM (a). This process is illustrated in figure 5 for SM (e). Recall that the leaves of a tree have height zero. In step (i) of the recursive procedure, we pick one vertex at height one and consider the state-space of its η 1 children. Since the child vertices are only connected to the parent vertex, we can use the method described for SM (a) to determine the orbit representatives of their state-space, of which there are C(η 1 + M − 1, η 1 ). We denote this set of orbit representatives R 1 . In step (ii), we include the parent node, resulting in a star graph, and determine the orbit representatives of the corresponding statespace. This is simply each possible vertex-state appended to each of the states in R 1 , so there are MC(η 1 + M − 1, η 1 ) orbit representatives. We denote this set of orbit representatives by R * 1 . This star can be permuted with any of the η 2 − 1 other isomorphic stars at the same height. In step (iii), we determine the orbit representatives of the SM by choosing all possible combinations of η 2 states chosen from R * 1 with repetition. Consequently, if the ith geometric factor is S η 1 S η 2 then the number of orbits of the corresponding state-space factor is This procedure can be easily extended to height-regular trees of arbitrary height, although such subgraphs are not common in real-world networks.

Appendix C. Symmetries in large real-world networks
Information about 15 large real-world networks is presented in table 1. The information for each network includes the number of vertices N, the number of edges |E|, the order (number of permutations) of the symmetry group |G|, the number of vertices moved by a permutation in the symmetry group N moved , the number of vertices fixed by all permutations in the symmetry group N fixed , the number of geometric factors m, the number of orbit representatives of the moved vertices ρ moved and the relative significance of the moved vertices moved = N moved log 10 (2) − log 10 (ρ moved ). In all cases, the number of fixed vertices means that the lumped state-space is beyond practical computation. While the relative significance is high in several cases (and hence the compression ratio is very high), the number of orbit representatives of the vertices moved by the symmetry group is also very large in most cases.

Appendix D. Network statistics
The following statistics are computed over the networks with non-trivial symmetry. The network repository website includes labels indicating the type of network and in figure 6, we illustrate how the number of networks breaks down according to these types. Each coloured rectangle indicates one type of network; the height of the rectangle is scaled to the number of networks of that type and the width is scaled to the mean number of nodes over the networks of that type. The majority of the networks correspond to chemical networks and animal social networks. While most of the networks collected represent real-world networks, our dataset does include 'synthetic' networks, which can generally be identified via their type or information provided on the website. Figure 7a is a histogram of the number of nodes in each network with non-trivial symmetry, coloured by type, and shows that the majority of networks have between 10 and 50 nodes. In figure 7b, we plot a histogram of mean degree, z, coloured by network type up to z = 6 and in figure 7c, we plot the same for z > 6. The mean degree of the animal social networks is peaked Where present, the numbers in brackets in the m column are the number of geometric factors whose number of orbit representatives could not be computed; all other cases were computed in full.  around z = 2, and we saw that many of these networks include multiple small components. The mean degree of the chemical networks is peaked at around 4, while the networks from graph theory tend to have relatively large degrees. Figure 8 illustrates the fraction of vertices moved by the automorphism group, specifically θ moved = |supp(G)|/N, coloured by type. Of the animal social networks, there is a roughly constant number of networks for values of θ moved between 0 and 1, but with a larger fraction at θ moved = 1. By contrast, the number of chemical and protein networks decays with increasing θ moved , more quickly in the latter case. All but one of the graph theoretic networks have θ moved = 1, i.e. these graphs are vertex-transitive.
In figure 9, we plot a Venn diagram of the number of networks that have at least one geometric factor corresponding to S n acting naturally [SMs (a) and (b)], C 2 [SM (d)] or multiple isomorphic components [SM (f)], and no other types of SMs. Figure 9 shows that the majority of symmetry is S n , i.e. SMs (a) and (b), and there is roughly an equal number of networks that have geometric factors corresponding to C 2 or repeated isomorphic components.  Figure 9. Venn diagram illustrating the main types of symmetry in networks whose symmetry is entirely due to SMs (a), (b), (d) and (f). The blue circle represents symmetric group S n where symmetry is due to leaves and cliques, the red circle represents the cyclic group of order two, C 2 , and the yellow circle represents symmetry due to multiple isomorphic components, indicated by the symbol ∼ =. Example SMs are illustrated next to the corresponding circles. (Online version in colour.)