Automation in Construction 3D mapping from partial observations: An application to utility mapping

Precise mapping of buried utilities is critical to managing massive urban underground infrastructure and pre- venting utility incidents. Most current research only focuses on generating such maps based on complete information of underground utilities. However, in real-world practice, it is rare that a full picture of buried utilities can be obtained for such mapping. Therefore, this paper explores the problem of generating maps from partial observations of a scene where the actual world is not fully observed. In particular, we focus on the problem of generating 2D/3D maps of buried utilities using a probabilistic based approach. This has the advantage that the method is generic and can be applied to various sources of utility detections, e.g. manhole observations, sensors, and existing records. In this paper, we illustrate our novel methods based on detections from manhole observations and sensor measurements. This paper makes the following new contributions. It is the ﬁ rst time that partial observations have been used to generate utility maps using optimization based approaches. It is the ﬁ rst time that such a large variety of utilities' properties have been considered, such as location, directions, type and size. Another novel contribution is that di ﬀ erent kinds of connections are included to re ﬂ ect the complex layout and structure of buried utilities. Finally, for the ﬁ rst time to the best of our knowledge, we have integrated utility detection, probability calculation, model formulation and map generation into a single framework. The proposed framework represents all detections using a common language of probability distributions and then formulates the mapping problem as an Integer Linear Programming (ILP) problem and the ﬁ nal map is generated based on the solution with the highest probability sum. The e ﬀ ectiveness of this system is evaluated on synthetic and real data using appropriate evaluation metrics.


Introduction
When managing massive urban underground infrastructure or to prevent incidents caused by underground working near utilities, it is required to have an accurate mapping of buried utilities such as gas pipes, water mains and telecommunication cables [1,2]. The statutory records obtained from the utility companies can provide such maps, but these records are notoriously inaccurate; for example it is reported [3] that in a survey of 187 incidents where utilities had been struck during excavation, 52% were not recorded in the utility company plans, and of the 89 that were on plans, 84% were inaccurately plotted. In the past ten years, utility incidents during excavation due to lack of record information caused 98 fatalities, 371 injuries and more than $500 million total loss in the United States [4]. Moreover, the inaccuracy in the locations of utilities may also yield a large number of "dry holes" (i.e. excavations where underground utilities are not found), resulting in waste of time, human resources and money. Instead of replying on statutory records, it is desirable to construct a utility map prior to commencing works based on on-site survey information such as subsurface sensor measurements. There has been considerable research into mapping from sensor data (in particular work on SLAM [5,6]) in free space, where in principle further data may be collected to improve mapping. On the other hand, in some applications, it may be very expensive or technically too challenging to obtain sensor data for an entire scene, for instance, in mapping buried utilities. Hence developing mapping methods to cope with partial data are needed. We explore this problem in the context of mapping the underground network of utilities beneath the streets [7].

Related work
The proposed research relates to several previous works on map generation, but none of them specifically addresses on our research focus, i.e. (i) an exploration of the problem in a holistic probabilistic point of view to represent as many as possible real-world scenarios, (ii) a uniform framework for data integration and map generation, and (iii) complete algorithmic approaches for generating probabilities and for formulating an integer linear program to yield the final map. In addition, compared with some literature where only artificial data has been tested, we have both included real-world and synthetic data in our computational experiments. Where possible, we make comparisons between our research and the existing work while doing the survey.
First we consider previous work on SLAM [8][9][10][11]. A very basic difference between this work and SLAM is that in SLAM a map of an unknown environment is updated by keeping the track of an agent's location, so the sensor data should be recorded and used sequentially in time. In contrast, in this work, the sensor data can be captured in any order and the whole map is generated at a single time by combining all the observations. 1 In addition, SLAM focuses on improving the accuracy of the measurements of sensor data and then obtaining a better map of the unknown environment while this work focuses on inferring the connections between utility detections, and their location.
Second, this work is related to existing work in the area of buried utility mapping [12][13][14]. [12,13] focus on sensor data fusion and do not use any information from street furniture surveys in the mapping. The system proposed by [14] is for creating a map of buried utilities based on manhole observations. However in [14], only straight pipes between manholes are taken into account while in the system proposed here, direct-connections and side-connections are included in the created map. To the best of our knowledge, this is the first time that utility properties other than location and direction are taken into account when generating maps of buried utilities.
An information fusion approach that integrates both sensing and non-sensing data based on Dempster-Shafer (D-S) evidence theory is presented in [2]. Their experiments suggest that the integration of sensing and non-sensing data significantly reduces the error of estimated location and enhance the confidence of the results. There is potential to also incorporate non-sensing data [1] such as existing records, utility specifications in our proposed framework, as long as these data can be represented by appropriate probability distributions.
Our research also relates to other works on underground utility mapping [15][16][17][18]. In [15], the authors present the initial results of a framework for 3D mapping of underground utilities especially an overview of methods for primary data capture. A framework is proposed in [16] for utility data governance from the underground utility data survey to data usage, while [17] proposes an approach that integrates multiple available utility location datasets to represent geographical uncertainties. In [18], registration of utility networks in a cadastre is considered in a 3D + time (=4D) context. Unlike the aforementioned works, which were mainly focused on data capture and 3D data registration, our work explores the holistic theoretical problem, gives a comprehensive framework for both data integration and map generation, as well as proposing a complete algorithmic procedure for generating the probabilities and formulating the integer programming model. All this shows the originality of and need for our research from both a practical and a theoretical point of view.

Problem statement and methodology
This work presents a framework to construct a most probable map of buried utilities based on data from street furniture surveys and subsurface surveys with multiple sensors.
In street furniture surveys, the information of manhole locations and the depth and direction of each pipe or cable end going through manholes is collected. In subsurface surveys, multiple sensors such as GPR [7,[19][20][21] or a Vibro-acoustic sensor (VA) [22] are used to capture sensor measurements. For a utility end observed from a manhole, based on an assumption that utilities follow a linear route, the direction of this utility is described by a probability field. If a hypothesized detection (HD) extracted from sensor measurements is a response of a utility, it also indicates the going direction of this utility. The probability field related to this utility end is updated by HDs associated with it. In this work, we employ Integer Linear Programming (ILP) to solve the mapping problem. ILP can take associations of any utility end with any subset of the HDs. This use of ILP guarantees that the most probable map is generated. Algorithms are proposed for computing the probability of connecting any two utility ends by extending them (referred to as direct-connections) and the probability of extending a third utility end so as to connect it to a utility segment formed by direct-connection (referred to as side-connection). Fig. 1 gives two examples of how direct and side connections can be formed. Besides the locations and directions of utilities, other properties of them such as type and size are also used in the probability computation. Hypothesized detection free measurements (HDFs) are also incorporated into this framework to modify the probability of connecting two utility directions.

Structure of the paper
The rest of the paper is organized as follows: The framework of the proposed method to generate a 3D map of buried utilities is discussed in Section 2, which is followed by the probability computations for utility connections in Section 3. The method of generating 3D maps by solving the formulated ILP is described in Section 4. Experimental results are shown and analysed in Section 5 and the paper finishes with conclusions in Section 6.

Overview of proposed framework
In this section, we give an overview of the proposed framework.

Manhole observations
From street furniture surveys, the locations of manholes are recorded and the manhole covers are lifted to check utilities going through them. For an observed utility end from a manhole, its depth is measured and the direction the utility takes as it leaves the manhole is estimated. Multiple utility ends may be observed from a single manhole so a manhole with utility ends observed in it can make up multiple manhole-utility pairs that are used for further processing.
In this work, we assume that the manhole locations and the start points of the observed utility ends are measured accurately (e.g. using a total station theodolite). Given a manhole-utility pair, under the linearity assumption, a utility goes out from the start point along a certain direction in a straight line. Since the directions of the utility ends are observed manually, they are not accurate. The going direction of a utility is described by a probability distribution (assumed to be Gaussian in this work).

Sensor measurements
Sensor measurements captured in subsurface surveys [7,23] can be incorporated into the proposed framework. From a sensor measurement hypothesized detections (HDs), which reflect the presence of certain utilities can be extracted [24][25][26][27][28][29] from the raw sensor data.
Hypothesized detection free measurements (HDFs) are also incorporated in the proposed framework. An HDF is a cross section (see detailed definition in Section 3) where a measurement was performed but no HD was extracted. Note that even there is no HD, that does not mean there is no utility, since sensors do not always detect buried utilities, for example because a GPR signal is attenuated severely in clay.

Connection probability calculation and integer linear programming model
After the data from manhole inspections has been collected and the HDs from sensor measurements have been extracted, utility ends, HDs and HDFs are associated to make up udf-combinations. The term "udf" comes from the three involved entities in such a combination: a utility end, hypothesized detections and hypothesized detection free measurements.
Based on the proposed algorithms (see details in Section 3), the probabilities of connecting the utility ends with direct-connections and side-connections are computed.
Next, we feed the computed probabilities into an Integer Linear Programming model given in Section 4. By maximizing the sum of connecting probabilities, a map of the buried utilities is obtained. The advantage of using integer linear programming is that all theoretically possible scenarios can be considered in a single model subject to certain constraints that make sure the resulting optimal mapping also complies with real-world requirements. Fig. 2 gives a flow chart of the proposed framework. First, the input data is captured from manhole observations and sensor measurements. Second, HDs and HDFs are associated to observed utility ends to make up udf-combinations. After the probabilities of any direct-connections and side-connections are computed, an integer linear programming (ILP) model is employed to find the most probable connections.

Probability computation for utility connections
The direction uncertainty of a buried utility can be reflected by a probability distribution of its going direction. The Gaussian distribution is chosen in this work. One advantage of using Gaussian distributions is that they can be conveniently combined to form a compound distribution by taking account of several data sources, e.g. manhole observations and sensor measurements as in our work. Other types of sources, such as existing maps, CAD drawings and utility specifications [2] can potentially be formulated into the proposed framework, as long as the data sources can be described by an appropriate probability model. In each of the subsection below this section, we present how to compute the probabilities of direct-connections and side-connections in different situations. Alongside the technical exposition of the computation of these probabilities, we add illustrative figures to help better explain how these computations can be applied in practice.

The probability distribution of a utility end
Based on the street furniture survey, the properties recorded for each utility u n , 1 ≤ n ≤ N includes the coordinates of the start point S n of the utility end, the direction of the observed utility end represented by two angles pan and tilt 2 as represented by α n and β n in Fig. 3 and other properties such as its size and type. Since a typical total station theodolite used for geo-measurement can measure distances with an accuracy of about 1.5 mm over a distance of up to 1500 m [30], it is assumed that the start points of the observed utility ends are measured accurately. On the other hand, the directions of the utility ends, which are observed manually, are usually not recorded accurately. As can be seen in Fig. 3, an arrow with a dotted line represents an observed utility direction and other arrows starting from the same point represents other possible going directions of this utility. In this work, it is assumed that the utility direction obeys a Gaussian distribution with respect to the random vector (pan, tilt) with the observed direction as its most likely extending direction. For a utility end with observed direction (α n , β n ), the probability of it going along the direction represented by (α, β) is estimated by the following probability density: where A u = (α, β), A n = (α n , β n ) with T representing vector transpose. Σ n ∈ ℝ 2×2 represents the covariance matrix of the Gaussian distribution of the n-th observed utility end. The covariance matrix of each observed utility end is recorded as a property of it, which reflects how accurately the utility direction is estimated. In practice, the parameters in the covariance matrix can be learned in an area with ground truth and then applied in other areas with similar conditions. Note that the accuracy may depend on the person performing the estimation, and a learned model may thus not be as useful if the person using the model has a different estimation model. This is unlikely to be a problem in practice though.

Associating utility ends with HDs and HDFs
Besides the street furniture surveys, the hypothesized detections (HDs) H = {h m }, 1 ≤ m ≤ M extracted from sensor measurements and hypothesized-detection-free measurements (HDFs) F = {f k }, 1 ≤ k ≤ K are also very important in the utility map generation. For each hypothesized detection h m , based on the geo-measurement of the related sensor measurement and the intrinsic properties of the related sensors, the location of h m is computed. Given a threshold value θ p and the    probability field defined by Eq. (1), the probability field with probability values greater than θ p forms a cone with the start point of the utility as the apex and the observed direction of this utility end as the axis (see Fig. 4). The HDs inside of the cone generated by a specific utility end are associated with the related utility, i.e. they are regarded as potential responses from this utility. Theoretically, ILP can deal with the associations of any utility ends with any HDs. However, if a HD is too far from a utility, it does not make any sense to associate them. To reduce the computation load, we use a utility cone to limit the number of associations between utility ends and HDs. After all h m ∈ H are associated with different observed utility ends, a utility end may have several h m associated with it and at the same time an h m may be associated with multiple utility ends. Among the provided sensor measurements, besides the ones with some hypothesized detections extracted from them, there are usually some measurements without any hypothesized detections extracted. Such measurements are hypothesized-detection-free measurements as aforementioned. In this work, HDFs are also included in the proposed system. A HDF is represented by a cross section, which is a rectangle with one side being the scan line of the sensor and two sides perpendicular to the ground surface with the length equal to the depth the sensor can measure. Two examples of HDFs (f 1 and f 2 ) are presented in Fig. 4. With respect to the same probability threshold θ p , if the cross section of an HDF has intersection with a probability cone, the HDF is associated with the related utility end (e.g. f 1 in Fig. 4). A utility end may have multiple HDFs associated with it and an HDF may be associated with multiple utility ends.
If a utility end u n has multiple HDs associated with it, due to the complexity of the utility configuration, the sensor measurement error and the processing error in the extraction step, an HD associated with a utility may not be a response of this utility. In this work, among all the HDs associated with a utility, we test all possible combinations. For example, if u 1 has two HDs h 1 and h 2 associated with it, there are four different combinations, i.e. none of them is a response of u 1 , only one of them is a response of u 1 and both of them are responses of u 1 . A utility with any subset of the HDs associated with it and all the HDFs associated with it make up a combination called a udf-combination.

Using extracted hypothesized detections
For a specific udf-combination, the utility end u n in this udf-combination generates a probability distribution, which is a Gaussian and can be computed with Eq. (1).
Suppose there are HDs in this udf-combination, then for a specific HD h m in this udf-combination, the vector connecting the start point of u n and h m indicates the direction of the utility under the linear assumption. Representing the vector connecting the start point of u n and h m as v mn , similar to the probability distribution of the direction of a utility generated by an observed utility end, the start point of u n with vector v mn generates a probability distribution with respect to the directions of the related utility end. In this work, a Gaussian is also used to describe this probability distribution. The covariance matrix set for the Gaussian distribution reflects the accuracy of the related sensor. Thus, we have two Gaussians associated with the utility end u n . These two distributions can be merged into an improved estimate by multiplying them together [31]. As pointed out by [31], merging two Gaussian distributions in this way leads to reduced standard deviations. Since multiplying Gaussian distribution results in a Gaussian distribution, the operation is symmetric, associative, and can combine any number of distributions in any order. So for a udf-combination with multiple HDs, the final distribution of this udf-combination is the product of all the probability distributions generated by the utility end and each HD in this udf-combination. With respect to the final distribution, the ray, along which the probability has the largest value, is called the axis of the distribution of a udf-combination.

Probability of connecting two Udf-combinations based on location and direction
For two udf-combinations generated from the same utility end, the probability of connecting these two udf-combinations is obviously 0. For two udf-combinations generated from different utility ends, based on the assumptions that the probability distributions of two udf-combinations generated from different utility ends are independent to each other and the utilities are linear, the probability of connecting two utility ends by direct-connection based on location and direction at a certain point I is the product of the probabilities of these two utilities when both of them go through point I. The probability of connecting two utility ends with a direct-connection based on location and direction can be defined as follows: This means that to compute the probability of connecting two utility ends u i and u j by direct-connection based on location and direction we should find the connecting point I i, j where the product of pld i and pld j  . This figure illustrates how to associate a utility end (u 1 ) with HDs and HDFs. Given a threshold value to the probability field generated by an observed utility end with respect to the going direction, a cone with the start point at the apex, and observed direction as the axis is formed. HDs inside of the cone such as h 1 are associated with the utility cone. If the cross section defined by an HDF has an intersection with the cone such as f 1 , it is also associated with the utility end. A scenario often encountered is that the lines passing through the start points of u 1 and u 2 along the axes of the related udf-combinations are skew lines and at the same time there is a common perpendicular of these two skew lines intersecting with the extensions of u 1 and u 2 along the axes at P 1 and P 2 respectively as represented by Fig. 5. In this situation, the common perpendicular line segment P 1 P 2 is discretized first with respect to the resolution of angles used for describing the direction of the utilities. For each of the disretized points, the probabilities of pld 1 and pld 2 are computed with Eq. (1) and then their product can be obtained. The maximum product is recorded as pld 1, 2 , the probability of connecting two utility ends by direct-connection based on their location and direction, and the related point is recorded as I 1, 2 .
Besides the scenario described above, there are several special scenarios as represented in Fig. 6. The first one is that two utilities intersect each other at a point I 1, 2 after the utility ends are extended along the corresponding axes of udf-combinations as represented with Fig. 6(a). In this scenario the probabilities of pld 1 and pld 2 have the largest value with respect to the corresponding probability distributions so their product has the largest value and the point I 1, 2 is recorded as the intersection point of these two utilities. In the scenarios represented in Fig. 6(b) and (c), the extension of one of the two utility ends along the axis of the related udf-combination goes through the start point of the other utility end or intersects at a point on the reverse extension of the other utility end. In both scenarios, the probabilities of connecting these two utility ends by direct-connection based on location and direction is computed with respect to the line segment between the two start points and the mid-point of this line segment is recorded as the intersection point. Fig. 6(d) describes a scenario where the lines passing through the start points and going along the observed directions are skew lines with one of their common perpendicular intersecting with one on the extension of the utility end along the axis of the udf-combination and the other one on the reverse extension of the utility end with respect to the udf-combination axis. A very special case of the scenario represented by Fig. 6(d) is that the common perpendicular line intersects with one of the utilities at its start point. In this scenario, the probability of connecting these two utility ends is also computed with respect to the line segment between the two start points.

Using hypothesized-detection-free (HDF) measurements
Just as a hypothesized detection indicates the presence of a utility to a certain extent, HDF measurements correspondingly indicate the absence of utilities to some extent. If a utility is supposed to pass through the cross section CS of an HDF with certain probability, since no hypothesized detection is extracted from this HDF, the probability of the related utility should be reduced. The attenuation ratio reflects our confidence on the related sensor. In this situation, the smaller the attenuation ratio is, the more confidence we have in the related sensor.
For each related connection, we try to find the maximum probability value of it to use in the Integer Linear Programming formulation. For a connection with the largest probability, if the connected utility segment passing through a CS of an HDF, the probability is attenuated and then the related connection may not be the one with the maximum probability value any more. The reason is that, besides the connections passing through the related CS, there are some connections not passing through this CS and the related probability values are not attenuated. In a probability distribution generated by a udf-combination, the maximum probability is obtained along the axis of this udf-combination; for the probability values along other rays, the closer the ray is to the axis, the larger the probability value that can be obtained. So for the connections not passing through the related CS, the maximum probability value can only be obtained when the two utilities are connected at a point on the boundary of the CS. So in this work, if a connection passes through a CS of an HDF measurement, the attenuated maximum probability is compared with the probabilities when the utility ends are connected at any point on the boundary of the CS. If there are some probabilities larger than the attenuated probability, the largest one is recorded as the probability of connecting these two utility ends and the related point on the boundary of CS is recorded as the connection point as the point B shown in the Fig. 7.

Using utility properties other than location and direction
From the street furniture survey, besides the start point location and direction of an observed utility, other properties such as its size and type may be also recorded. The recorded properties other than location and direction of the observed utilities are also formulated in the proposed system.
It can be seen that the probability of connecting two udf-combinations by direct-connection given by Eq. (2) is only related to the locations and directions of the axes of the corresponding udf-combinations. Obviously, when connecting two utility ends, the other properties such as their size and type should be taken into account: for example if two utility ends have different types (e.g. one is Water main and the other on is Cable), there is no possibility to connect them. Thus we define the probability of connecting utility ends u i and u j based on their types as follows: Besides the type of a utility, the size of a utility i.e. the diameters of the pipes or cables is another important property. Considering that the Laplace distribution has a sharp peak and heavier tails than a Gaussian distribution, a Laplace distribution is selected to compute the probability of connecting two utilities based on the size of them. First, the sharp peak of Laplace distribution can benefit the connection of two utilities with similar size. Second, the heavier tails of Laplace distribution can tolerate large measurement errors. For two utility ends u i and u j , if the recorded diameters of them are r i and r j respectively, the probability ps i, j can be estimated as follows: where b > 0 is a scalar parameter of a Laplace distribution. The If other properties such as materials of the utilities are recorded, the associated probabilities can be defined and taken into account. In the experiments reported below in Section 5, besides the direction and location of utilities, only the type and size are taken into account, therefore only the probabilities of these properties are defined explicitly in this paper.

Probability of connecting two utility ends with direct-connection
After the probabilities of connecting the utility ends of two udfcombinations based on location and direction and other properties are computed, the probability of connecting two utility ends with directconnection pd i, j can be computed. The type of a utility and the location and direction of this utility are independent to each other and within a certain type of utility, the size of a utility is independent of its location and direction, so the probability of pd i, j can be defined as the product of pld i, j , pt i, j and ps i, j , i.e.

Probability of connecting a utility end to a utility segment by sideconnection
After the utility ends of two udf-combinations are connected by direct-connection to make a utility segment between two manholes, the connection of the utility end of a third udf-combination to this utility segment is called a side-connection as represented in Fig. 8.
Let l denote the direct-connection of two utility ends and let c represent a side-connection of a third utility end to the utility segment made by the direct-connection. The probability of connecting a utility end u k to a utility segment formed by connecting u i and u j with directconnection can be represented as follows where E i, j represents the event of connecting u i and u j with directconnection and E k→i, j represents the event of connecting u k to the segment formed by u i and u j . The probability p(E i, j ) is given by Eq. (5).
When computing the probability of p(E k→i, j | E i, j ), since u i and u j have been connected before connecting u k to this segment, the location of this segment is fixed. Suppose the utility segment formed by u i and u j is connected at point I as depicted by Fig. 8, and then if utility u k is connected to the segment formed by u i and u j by side-connection, it connects to S i I or S j I at some point T. To decide the location of T, the common perpendicular of the ray starting from u k along the axis of the corresponding udf-combination and S i I or S j I is computed. If the  common perpendicular, which intersects with the ray and S i I or S j I, exists as C 1 C 2 depicted in Fig. 8, then the intersection point of this common perpendicular with S i I or S j I is the connection point T. The probability p(E k→i, j | E i, j ) can be computed as where pld k→i|j represents the probability of connecting u k to the segment formed by u i and u j just based on the location and direction and pt k→i|j and ps k→i|j are the probabilities computed based on the type and size properties of u k and u i or u j . If the common perpendicular line segment, which intersects with the ray generated by u k and the segment formed by u i and u j , does not exist, and then the probability pld k→i|j is set to 0.

3D map generation by integer linear programming
Integer Linear Programming (ILP) [32] is a mathematical optimization technique that in the general case can be described using the following canonical form with n variables and m constraints: In the above formulation, ∑ j=1 n c j x j is called the objective function and ∑ j=1 n a ij x j ≤ b i , i = 1, …, m are the constraints, both of which should be linear. The decision variables x j , j = 1, …, n are required to be non-negative integers. A special case of ILP is when the decision variables are binary, i.e. x j ∈ {0, 1} as is the case in this work. The goal of ILP is to minimize (as in the canonical form) the objective function while satisfying all constraints and variable domain requirements. ILP is known to be NP-hard [33]. However, in our research, the problem sizes are rather small: after removing variable redundancy, our synthetic instance has 3538 binary variables and 3235 constraints, and the real-world instance has 50 binary variables and 22 constraints. Therefore it is sufficient to use a state-of-art commercial ILP solver to solve the problem by branch-and-bound.
In a solution of the ILP of this work, for a probability value c j of connecting two utility ends by direct-connection, only if the related decision variable x j is 1 are the two utility ends then connected by direct-connection. For a probability value c j of connecting a utility end to a utility segment with side-connection, only if the related decision variable x j is 1 is this utility end connected to the utility segment. The final utility map is generated by connecting the utility ends indicated by all the decision variables with value 1 with the corresponding connection option, direct-connection or side-connection.
To decide which combinations should be used in the connections and how they are connected to each other, an Integer Linear Programming formulation is designed for solving the aforementioned mapping problem.
For any two combinations c i , c j ∈ C, the probability of connecting them by direct-connection to make a utility segment between two manholes is computed. A special case is that a udf-combination may not be connected to any other combinations, but is extended to the boundary of the working area. For each such udf-combination, it is assumed that the udf-combination is connected to an artificial boundary udf-combination, which is denoted as c b . A utility segment formed by c i and c j by direct-connection is denoted as l(c i , c j ) and the set of all utility segments formed by direct-connection of two combinations is denoted as L = {l(c i , c j )| c i ∈ C, c j ∈ C ∪ {c b }}. Let p l denote the connection probability of a direct-connection l(c i , c j ) ∈ L and C(u n ) denote the set of the combinations generated from utility end u n . Since two combinations from the same utility end cannot make a utility segment, we set p l(c i ,c j ) = 0 if c i , c j ∈ C(u n ) for any u n ∈ U. If one of c i and c j is c b , then the related p l is set to a small constant value to ensure the correctness of the normal cases is not affected. For other situations, the computation of p l is given in Sections 3.4, 3.6 and 3.7.
After the probabilities of p l are computed for any l ∈ L, the probabilities of p c l of connecting a udf-combination c ∈ C to a direct connection l ∈ L is computed. Details of the computation are given in Section 3.8. As for the decision variables, let x l ∈ {0, 1}, ∀ l ∈ L be the binary variables for the direct-connections such that x l = 1 if direct-connection l is used in the solution and x l = 0 otherwise. Binary decision variables y c l ∈ {0, 1}, ∀ c ∈ C, ∀ l ∈ L are used to determine the sideconnections such that y c l = 1 if a side-connection from c to directconnection l is used in the solution, and y c l = 0 otherwise.
We now formally present the ILP formulation for the aforementioned utility mapping problem, thus: Fig. 8. A diagram illustrating connecting a utility end u k by side-connection to a utility segment obtained by direct-connection of u i and u j . In this figure, S i , S j and S k are the start points of the corresponding utilities.
The objective function (11) maximizes the total probability given by the connection, including the ones contributed from direct-connections as well as from side-connections. W is a weight to penalize the influence from side-connections as usually they are less important than directconnections. Constraints (12) make sure that a side-connection to a direct-connection can only be triggered if the direct-connection is used in the first place. Constraints (13) state that for each utility end at most one of its associated udf-combinations can be used in the solution, and when a udf-combination is used, it can only choose one connection type either as a direct-connection (i.e. ∑ c i ∈C∪{c b } x l(c,c i ) = 1) or as a side-connection (i.e. ∑ l∈L y c l = 1). Finally, (14) and (15) give the domain of the decision variables.
To generate a map of the selected area, the observed manholes and the utility ends observed from each manhole are drawn. Once the solution of Objective (11) is obtained by solving the Integer Linear Programming problem, the direct-connections indicated by parameter x l are connected at the corresponding connection point I. Then the parameter y c l is used to pick out the udf-combinations, which are connected to certain l by side-connections at the related connection point T. The final map includes the direct-connections and side-connections. We illustrate how the above ILP model is formulated by a toy example given in Fig. 9. Three utility ends u 1 , u 2 and u 3 associated with four udf-combinations c 1 , …, c 5 are given, where u 1 is linked with c 1 and c 2 , u 2 is linked with c 3 and u 3 is linked with c 4 . The five directconnections are marked by l 1 , …, l 5 with their corresponding probabilities in the parentheses afterwards. The six side-connections are listed in the table with their probabilities. For simplicity, the boundary combination c b is ignored here, which can always be added back if needed. Based on this toy example, the following ILP model can be formulated: Fig. 9. A toy example of how to formulate the ILP model.   The optimal solution is also depicted in Fig. 10.

Experimental results
In this section, we present the experimental results with the proposed system on synthetic and real data. We use the F 1 score (aka Fmeasure) to assess the quality of our experiment results. We regard each connection pair from the ground truth as a positive result, and each pair that is not linked from the ground truth as a negative result. Let TP, FP, TN, FN be the numbers of true positive, false positive, true negative and false negative connections respectively. The precision Pre and recall Rec are defined as, and the F 1 score is calculated as the harmonic mean of both by, The reason for choosing the F 1 score is due to the fact that the distribution of actually connected (TP + FN) and unconnected pairs (TN + FP) is heavily unbalanced, where we have TP + FN ≪ TN + FP.
In our experiments, the covariance matrix Σ n in (1) is set as where the pan and tilt are measured in degrees. The parameter b used in the Laplace distribution (4) is set as b = 0.5, which ensures that when r i = r j , the probability density is equal to 1.

Synthetic data
We first apply the proposed system on a synthetic data set. The manhole configuration of this synthetic data set is extracted from the utility records of a real scene. Since there is no depth information in the utility records, the depths of all pipe ends are set to 1. The directions of pipe ends are computed based on the known start points of them and the given intersection points between pipes. Random Gaussian noise of up to 25 degrees is added to the pan and tilt angles of each of them. The locations of hypothesized detections (HDs) are computed by finding the intersection points of CSs with the related pipes and then adding random noise of up to 0.2 m. Several hypothesized-detection-free measurements (HDFs) were also added manually. The observations with respect to this data set are given in Fig. 11 including 22 manholes, 41 utility ends, 5 HDs and 4 HDFs. Fig. 12 shows two experimental results of using different numbers of HDs and HDFs. In the experiment with respect to the top image in Fig. 12 only 4 HDs and 3 HDFs were used and in the experiment with respect to the bottom image in Fig. 12 all the 5 HDs and 4 HDFs were used. It can be seen that there are more incorrect connections (cyan lines) in the top image since fewer HDs and HDFs were used in the experiment. Compared with the ground truth, the incorrect connections happen at u 4 and u 11 . In the ground truth, they are connected by a direct-connection. Because too much noise was added to the direction of u 4 and u 11 , in the experiment results, they are wrongly connected to other pipe segments by side-connection. If more HDs are added between u 4 and u 11 , this wrong connection can be fixed. In both experiments, the weight parameter W in Objective (11) is set as 0.41. In different applications, this parameter may need to have other values. As we have mentioned above, to apply the proposed framework on a large area, this parameter should be learned in a smaller area with ground truth. Table 1 gives a summary of all the results from the three datasets in our experiments in terms of precision, recall and F 1 score, as computed by (16) and (17). It can be observed from the table that precision, recall and F 1 score have all significantly improved after including more HDs and HDFs.

Real data
We also applied the proposed system to a real data set captured on a test scene shown in the left image in Fig. 13. The locations of the manholes were measured by a total station theodolite. The cover of each manhole was lifted and the depth and direction of each utility end observed in the related manhole were measured and estimated. An example manhole picture after the cover was lifted is given by the top right image in Fig. 13 and the related estimated directions of the observed utilities are shown by the bottom right image in Fig. 13. It can be seen that 13 manholes shown by black rectangle windows are observed. 51 utility ends are observed in total from this scene. The utility type and size are also recorded in the observations. There are three different kinds of utilities: foul water, surface water and cable. Since the type and size of the observed utility ends are employed in generating the utility map of the test scene and at the same time different depth values of different utility ends provide more discriminations between different utility segments, the utility map of this scene is generated as expected without using any sensor measurements. In this experiment, the weight parameter W in Objective (11) is set as 1. It means we do not give any bias to direct-connection. Although we do not have the complete ground truth of this scene a number of pits were dug at carefully chosen locations to verify the correctness of the map; no inaccuracies were discovered. The generated utility map is shown in Fig. 14, and its corresponding precision, recall and F 1 score values are shown in Table 1. In this figure, a top view of this map is shown in the top image and a side view is shown in the bottom image.

ILP solver
After the probability values p l and p c l in Eq. (11) were found, the Integer Linear Programming problems were solved on a 64 bit Xpress-MP suite (version 8.0), a commercial optimization package produced by FICO [34], on a Dell workstation with 32G RAM and an Intel Core i7-4790 CPU in Windows 7 Enterprise. The default Integer Linear Programming solver of the suite was used for solving the ILP instances. All the problem instances were solved within 1 s by the above package. Fig. 11. The configuration of the synthetic data set (top) including manhole observations u i , hypothesized detections h m , hypothesized-detection-free measurements f k , and at the bottom, the ground truth. The spatial unit is in metres.

Conclusions
In this paper we introduce a novel method for accurately mapping buried utilities based on the manhole observations from street furniture surveys and sensor measurements. One particular theoretical contribution is the algorithm proposed in Section 3 for calculating the probabilities of the connections under various scenarios. The mapping problem is formulated as an Integer Linear Programming model, which incorporates locations and directions of utility ends, hypothesized sensor detections, hypothesized detection free sensor measurements and the properties of utility ends other than location and direction in the same framework. The experimental results demonstrate the effectiveness and efficiency of the proposed method. In addition, the experimental results on the synthetic data set indicate that the more sensor data is used the more accurate a utility map can be achieved while the experimental results on the real data set demonstrate that using utility properties other than location and direction makes the proposed method more robust and accurate.
A disadvantage of the proposed framework is that every generated segment has to have at least one utility end observed from a manhole, which may not be always satisfied in practice. We leave for future work the incorporation of utility segments formed only from sensor measurements into this framework [12]. This should not be difficult for GPR sensor detections since the estimated direction of a utility can be detected from the sensor readingthus each GPR detection could be regarded as a utility end, rather than an HD. However, an uncertainty would have to be included to represent the possibility that the HD was not in fact of a utility at all (whereas a manhole observation always represents one or more actual utility lines).
Moreover, unfortunately, obtaining real world survey data (including manhole inspections, sensor readings, and verification pits for ground truth) requires a large amount of money which thus limited the size of the real-world instance we could use. We designed the larger sized synthetic instance to better verify our method for larger sized cases. We will leave further verification by large scaled real-world cases to possible opportunities or projects in the future. A third limitation is the need to convert the information provided by the data source into appropriate distributions, such that they can be combined into a compound one.
A future research direction could be how the proposed probabilistic method presented here could be extended to take into account of existing information (e.g. utility records), such as in the work in [1]. This can be realized by converting the existing records into corresponding probability distributions, which is viable and the future work would be about the details of how this conversion could be achieved. Another future direction is to update and improve the quality of existing records. For instance, it might be possible to create a new integer linear programming model to minimize the discrepancies between the manhole/ sensor data and the existing records.
Finally, it is worth commenting on the relationship between this   research and the UK PAS 128 standard which is a methodology for delivering robust utility surveys (https://www.pas128.co.uk). There are several categories, from D (desktop survey which just uses statutory records, type C (which uses site reconnaissance to match a type D survey with street furniture observations), type B (using multiple geophysical sensors; each 5 m segments should have a quality level for vertical and horizontal detection accuracy) and type A (where position has been verified by trial pits, vacuum extraction etc.). The work presented in this paper would help construct PAS 128 type C surveys in particular, although determining the quality level (QL1-QL4) is not presently supported). The major benefit would be as an automated method for constructing maps from the manhole surveys and sensor measurements. Subsequent manual interpretation would be required to assign quality levels.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.