Risk-Aware Contextual Learning for Edge-Assisted Crowdsourced Live Streaming

This paper proposes an edge-assisted crowdsourced live video transcoding approach where the transcoding capabilities of the edge transcoders are unknown and dynamic. The resilience and trustworthiness of highly unstable transcoders in decision making are characterized with mean-variance-based measures to avoid making highly risky decisions. The risk level of each device’s situation is assessed and two upper confidence bounds of the variance of transcoding performance are presented. Based on the derived bounds and by leveraging the contextual information of devices, two risk-aware contextual learning schemes are developed to efficiently estimate the transcoding capabilities of the edge devices. Combining context awareness and risk sensitivity, a novel transcoding task assignment and viewer association algorithm is proposed. Simulation results demonstrate that the proposed algorithm achieves robust task offloading with superior network utility performance as compared to the linear upper confidence bound and the risk-aware mean-variance upper confidence bound-based algorithms. In particular, an epoch-based task assignment strategy is designed to reduce the task switching costs incurred in assigning the same transcoding task to different transcoders over time. This strategy also reduces the computational time needed. Numerical results confirm that this strategy achieves up to 86.8% switching costs reduction and 92.3% computational time reduction.

the revolution, crowdsourced live streaming platforms (CLSP) such as Twitch, TikTok, and Periscope have emerged as a new type of video platforms, that not only serve tremendous viewers all over the world but also receive live videos from various sources in the crowd [1], allowing a growing number of people to broadcast their live videos over the Internet.
However, due to the heterogeneity of broadcasters' devices, different quality versions of live videos are created [2]. As a result, there is a strong need to transcode the original live videos into several industrial standard representations and to serve viewers with a set of proper versions of representations. Providing the adaptive bit rate (ABR) service [3] can bring massive computational demands due to real-time processing requirements. For instance, until 2022, there are about 9.2 million broadcasters and 140 million viewers, which are active on Twitch monthly, and there are an average of more than 100,000 live channels concurrently at any point in time [4].
Therefore, instead of building private data centres to facilitate ABR, cloud computing has become a natural solution to perform transcoding because of its powerful computing ability and the 'pay as you go' feature. Furthermore, the emergence of cloud computing releases CLSP from building large, expensive private data centres. In such a system, the CLSP controller will decide the number of representations that need to be transcoded for each broadcaster based on parameters such as viewer capacity, playback delay, bandwidth consumption etc. The original live videos will be directly transmitted to the cloud data centre for transcoding. When multiple versions are generated in the cloud, content delivery networks will be utilized to deliver proper versions of live videos to the corresponding viewers.
On the downside, in current CLSPs, the cloud transcoding is not able to provide the ABR service to most of the broadcasters. For instance, in Twitch.TV, only the premium broadcasters have access to the ABR service, and for the rest of the broadcasters, only the original versions are available for their viewers [1]. The reason behind is that a general cloud instance can only deal with at most two transcoding tasks simultaneously. Therefore, an enormous cost will be incurred when a large number of original live videos are scheduled for transcoding. Moreover, in cloud transcoding systems, the cloud data centre can be far from the viewers or the broadcasters, which can cause high latency. This problem can be further magnified considering the fact that most of CLSPs enable an interactive live chat service which is also latency-sensitive as compared to the traditional live streaming platforms.
The development of edge computing (also know as fog computing) has brought a potential transcoding solution for CLSP. Since edge computing [5] is more suitable for real-time processing and low-latency applications, it can be treated as a viable replacement [6], [7] to address the shortcomings of the cloud transcoding. Moreover, edge-assisted transcoding can lead to lower latency and avoid the network traffic traversing through the core network since different versions of videos can be created at the network edge.
Edge transcoding systems have been proposed to leverage the computational resources at the user end [8], [9], [10], [11], [12]. However, most of the works relies on solving optimization problems in the presence of perfect knowledge of the system parameters and the performance of edge devices. However, acquiring such knowledge might not be practically feasible in the live streaming systems. Particularly, existing works do not consider the risk of assigning a transcoding task to a device with highly unstable transcoding performance.
This paper proposes a novel concept to quantify the edge devices' transcoding capabilities leveraging their contextual information. This concept considers not only the average performance of the transcoders but also the risk of performance variations, which account for the design of a joint transcoding task assignment and viewer association algorithm. To the best of authors' knowledge, this paper is the first work to consider both risk sensitivity and context awareness in an online task offloading problem. The contributions of this paper are summarised below: • The joint transcoding task assignment and viewer association problem is formulated to maximize the long-term network utility considering the cost and average latency. • The transcoding capability is proposed to identify proper transcoders with both high transcoding quality and low transcoding performance variation. The transcoding quality is modelled as a linear function of transcoder's context information and the performance variation is represented as the variance of transcoding outcomes. • To estimate and learn the unknown transcoding capability, two upper confidence bounds (UCBs) of the transcoding performance variation are derived. Applying the confidence bounds and by involving a contextual UCB of the transcoding quality, a risk-aware contextual online learning scheme is designed to learn the transcoding capabilities of edge devices. The idea of considering both the context and risk awareness helps designing efficient and robust task offloading schemes and reduce the risk of assigning tasks to devices that cannot ensure stable performance. • Based on the learnt transcoding capabilities, a novel joint transcoding task assignment and viewer association algorithm is designed. Particularly, an epoch-based strategy is designed to reduce the switching costs of task assignments. • Numerical results based on various settings demonstrate that the proposed algorithm achieves significant network utility improvement while keeping the switching costs of transcoding task assignments competitively low.
The remainder of this paper is organised as follows. Section II discusses the related works. Section III describes the model of the edge transcoding system. In Section IV, an optimization problem is formulated to maximize the overall network utility. The designed risk-aware contextual learning for edge transcoding algorithm is described in Section V, followed by the numerical results in Section VI. The conclusions are drawn in Section VII.

II. RELATED WORK
In this section, we review the works in both cloud and edge-based live video transcoding systems and discuss the challenges in designing the edge-based transcoding schemes.

A. Cloud Transcoding
Due to the powerful computing ability and the 'pay as you go' feature of cloud computing, previous works tended to implement the transcoding system with the help of the cloud resources and designed various quality of experience (QoE) metrics for cloud transcoding systems [13], [14], [15], [16], [17], [18], [19], [20]. For instance, [13] designed a cloud-based scheme to transcode crowdsourced video contents. The QoE is a function of the bit rate of the received live stream and the broadcaster's popularity. Reference [14] proposed a cloud transcoding scheme considering delay constraints. In this work, the QoE was defined as a non-decreasing concave function of the received bit rate. In [16], a new live streaming framework was designed to minimize the content delivery delay with cloud transcoding. Reference [17] proposed a cloud transcoding scheme for both delay-tolerant and delaysensitive videos with different priorities. In [19], a scheme was designed to limit the peak power consumption while maximizing the total processing capacity in a server with heterogeneous processors using dynamic programming. Reference [20] designed recurrent network and convolutional network based approaches to forecast the approximate transcoding resources which is reserved for transcoding and to maximize the quality of service (QoS).

B. Edge and Crowdsourced Transcoding
Due to the abundance of concurrent live broadcasters and the heterogeneity of source contents, a substantial amount of transcoding tasks are generated which are delay-sensitive and computationally intense. As a result, even cloud transcoding cannot meet these requirements with affordable cost [21]. Therefore, edge computing has been considered as a viable replacement because of its fast processing and quick application response time [22]. However, it is highly challenging to achieve optimal transcoding task assignment and viewer association due to the massive heterogeneous video contents and diversified QoE demands [23]. In [24], a collaborative joint caching and transcoding scheme was proposed to reduce the backhaul link usage and the viewer perceived delay. In [9], a reinforcement learning (RL)-based scheme was designed to solve the edge transcoding decision-making problems. To better schedule edge transcoding under large state space,  [20], [23], [25], [26], [27]. In addition, [21] combined both the cloud and the edge resources to collaboratively transcode live videos from multiple broadcasters.
In [6], a case study was presented for Twitch demonstrating that with the advance of personal computing devices, a significant fraction of CLSP viewers' devices potentially have appropriate computing resources for real-time transcoding. In addition, the viewers have already expressed the willingness to support the broadcasters and the CLSPs in terms of donation and subscription [21]. Thus, the cost by involving them into transcoding can be much lower as compared to general edge computing. These studies demonstrate the potential of incentivizing the viewer devices to do transcoding. However, since the viewer devices are not professional and can be highly heterogeneous, their performances may be unknown and unstable. In [7], the transcoder selection relies on the prior data collection and analysis, which may become inaccurate over time. Therefore, optimal online decision-making strategies which can learn devices' performance and select devices which are more capable for transcoding are highly desirable.
Such edge-assisted crowdsourced transcoding systems are similar with the crowdsourcing systems which exploit the collective intelligence of crowd, provide an effective paradigm for large-scale data acquisition and distributed computing [28], [29], [30]. In the edge-assisted transcoding system, the viewer devices assigned with transcoding tasks can be treated as the crowd workers for computing. The crowdsourcing system has been introduced into many areas such as text translation, consumer research, and hiring workers for software development.
There have been extensive works on task assignment problems in the crowdsourcing systems [31]. As a classical decision-making model, multi-armed bandit (MAB) has been used to model the task assignment problems. For instance [32] proposed a UCB-based task assignment algorithm with a limited budget for crowdsensing. Reference [33] modelled the crowdsourcing system as an MAB and proposed a bounded ε-first algorithm to maximize the overall utility of completing a number of tasks. [34] proposed a budget-limited UCBbased greedy approach to learn the worker performance of a crowdsourcing system and to select workers with high performance to maximize the long-term utility. In [35], a hierarchical context-aware learning algorithm is proposed to learn and estimate the worker's context-specific performance in mobile crowdsourcing.
We have reviewed some recent papers in Table I on live video transcoding offloading with edge computing. Most works either consider the context information of the system and edge devices when making decisions [31], [36], or directly formulate the feedback of the decision as a function of context information. In [38], the risk of performance fluctuation is considered in the MAB problem. However, this work only studied a standard MAB that at each time only one arm will be played and the arm selection is not aware of the context information of each arm. In addition, there are limited works studying to consider the risk of decisions which can lead to high performance fluctuation. However, these works only consider known problem-specific risk such as the edge devices' online stability and the probability of failure [7], [11], [39]. None of these works directly models feedback of decision as a function of risk to make the task offloading risk-aware.
Overall, although the task assignment problem in edge-assisted computing systems has been studied in recent years, which can be used in edge-assisted crowdsourced transcoding systems, several technical problems have not yet been addressed. First, the existing task assignment decision-making models are not practical enough since the tasks were assumed to arrive sequentially and the number of tasks need to be assigned per time slot is fixed. In addition, most of existing works only focus on identifying the device with the highest average performance for task offloading. Although some works considered risk in the edge computing framework, the risk is not integrated into the feedback of the decision. Moreover, the switching costs of assigning a task to different edge devices over time have not been considered yet. In particular, to the best of our knowledge, there is no work studying to combine risk and context awareness in the edge-assisted task assignment problem.

III. SYSTEM MODEL
We consider an edge-assisted CLSP as illustrated in Figure 1. The raw live videos from the broadcasters are first uploaded to the regional data centers which are responsible for video transmission and transcoding task assignment. The pre-processed videos (e.g. segmented video chunks) with the  original bit rates are then forwarded to edge transcoders and the transcoded videos will be transmitted to the end viewers. Table II summarizes the symbols used in this article.
Define a set of broadcasters, i.e., I = {1, 2, · · · , I}. The bit rate of the original live video of broadcaster i ∈ I in time slot t is defined as B t i . Moreover, consider there are J t viewers in time slot t and V t i,j is a binary parameter indicating whether viewer j ∈ J t = {1, 2, · · · , J t } chooses to watch broadcaster i's live stream in time slot t or not. In addition, denote y ∈ Y = {1, 2, · · · , Y } as a video representation which is one of Y possible standard quality levels of a transcoded video and the bit rate of representation y is defined as b y .
When an original live video is uploaded by a broadcaster i to the regional data centers, the scheduler of CLSP will decide which representation should be transcoded by which edge device based on viewer requirements, the performance of the edge devices, and system constraints such as the experienced delay by users as well as the cost for transcoding. In the transcoding process, each edge device f ∈ F = {1, 2, · · · , F } is able to transcode the broadcasters' live videos into standard video versions where F represents the set of all available edge devices. After the transcoding process, all of the standard transcoded live videos will be transmitted from the edge devices to the associated viewers. In addition, it is assumed that a viewer can only watch one transcoded video from a broadcaster in the same time slot.
To describe the transcoding task assignment and the viewer association, a binary variables is defined as I t i,y,f,j which takes 1 when edge device f is selected to transcode the original video from broadcaster i requested by viewer j into representation y in time slot t, and 0 otherwise.

A. Cost Model
To incentivize an edge device to participate in transcoding, we define the cost of transcoding one live video from broadcaster i to representation y at edge device f , which is paid by the CLSP for the edge device, as c t i,y,f , which is written as where Φ y is a non-decreasing concave function of representation y and ω t f is defined as the transcoding capability of device f in time slot t.
A higher value of ω t f means the edge device is more reliable and it can transcode a live stream with higher quality and less delay. To encourage edge devices with higher transcoding capability to join the transcoding candidate pool, c t i,y,f is assumed to be linearly increasing with ω t f . The transcoding capability plays an important role which is not only related to the transcoding quality but also the transcoding performance uncertainty. In a nut shell, an edge device with relatively higher average performance and lower performance fluctuation is more capable for transcoding. The exact definition of transcoding capability will be presented in Section V.
Based on the definition of c t i,y,f , the total cost related to broadcaster i (denoted by c t i ) can be defined as

B. QoE Model
From the perspective of a viewer, the quality of the received video, namely, the received bit rate can greatly determine the viewer's experience [14]. Therefore, we use the term QoE to denote how good the received video is. The QoE is determined by two factors. First, the acceptable quality levels of the received live videos of different broadcasters vary in terms of their genres (e.g., card game, pixel art game, first shooter game, etc). By categorizing the live videos into a set of genres denoted by G = {1, 2, . . . , K} and defining g t i ∈ G as the genre of video from broadcaster i in time slot t, we can define s g t i as the suggested basic bit rate, according to the genre of broadcaster i, for viewer to watch the live video in time slot t. Second, it is vital to consider the network capacity of each viewer. Let u t j be the highest bit rate that viewer j can receive, which varies due to the viewer network condition, the QoE model can be expressed as where b y represents the bit rate of representation y. In (3), the QoE model is a non-decreasing concave function of two ratios. The first ratio quantifies the effect of the network condition of viewer j. The higher this ratio is, the better QoE can be achieved. However, this ratio should not exceed one, and a constraint is added to the optimization problem; Otherwise, the viewer capacity is smaller than the bit rate of the transcoded representation and this transcoded representation cannot be smoothly played at the viewer end. The second ratio quantifies how better the received video quality is compared with the basic genre rate of broadcaster i considering the fact that same representation from different genres of broadcasters can lead to different QoE levels.
The QoE of a viewer j watching a transcoded video from broadcaster i can be calculated as

C. Delay Model
Delay is another performance measure that should be taken into account for the optimal transcoding task assignment and viewer association for real-time processing and delaysensitive applications in the edge-assisted CLSPs. The latency experienced by the viewers can be categorized into three types, i.e., transmission delay, transcoding delay, and playout delay.
The transmission delay is referred to as the round trip time, including the broadcaster-transcoder delay and the transcoderviewer delay. In the traditional cloud transcoding system, both the broadcasters and the viewers can be far from the cloud data center, which brings non-negligible latency. The transcoding delay is the processing delay of transcoding a live video to a different quality version. Normally, the transcoding delay can be calculated as the time difference between the input of original video and the output of the transcoded representation. The playout delay is determined by the viewer devices and their decoding time. Thus, it would not affect the the transcoding task assignment and the viewer association and hence not involved in this paper.
Let define ξ t j and δ t j as the transmission delay and the transcoding delay experienced by viewer j in time slot t, respectively. The transmission delay can be expressed as whereτ t i,f,j denotes the network delay from broadcaster i to viewer j via edge device f . Next, the transcoding delay can be represented as whereδ i,y,f represents the transcoding delay for edge device f to transcode an original live video with bit rate B t i to a representation with bit rate b y . Therefore, the overall latency in the proposed transcoding system in time slot t experienced by the viewer j can be represented as IV. OPTIMIZATION PROBLEM: EDGE TRANSCODING AND VIEWER ASSOCIATION According to the system model formulated in the previous section, there is a tradeoff between QoE maximization and cost minimization imposed on CLSP. On one hand, CLSP prefers to incentivize more edge devices to participate in transcoding and provide ABR service to more viewers. The more I t i,y,f,j is set to one (i.e., a larger number of edge devices is selected for transcoding), the higher QoE can be gained. On the other hand, this will lead to higher cost based on (1). Therefore, the binary indicators must be optimized carefully to balance the tradeoff between the QoS and the cost as two components of the network utility. To formalize such a tradeoff, we define the weighted-difference between the QoE and cost (which is referred to as the network utility) related to a broadcaster as where the parameter λ is used to tune the tradeoff between the two components. Aiming to jointly optimize the transcoding task assignment and viewer association by maximizing the total network utility in each time slot over the whole transcoding system. We therefore, formulate an optimization problem as C2 : where C1 makes sure the received bit rate is lower than both the original video bit rate (B t i ) and the viewer capacity. C2 ensures that a viewer can only play one representation from one broadcaster in each time slot. C3 guarantees that each transcoder can only serve M viewers at most due to the limited bandwidth resource. C4 guarantees that variable I t i,y,f,j is binary. C5 ensures that for every viewer, the experienced delay of each viewer is lower than a predefined threshold D th . The formulated problem is a linear integer programming problem which can be efficiently solved by an optimization toolbox called Mosek [40]. However, the transcoding capabilities of transcoders are required to solve P P P, that are unknown in real live streaming systems. Therefore, an online learning scheme is highly demanded.
V. RISK-AWARE CONTEXTUAL LEARNING FOR EDGE TRANSCODING In light of the proposed system model in Section III, in this section, a novel risk-aware learning algorithm is designed to learn the transcoding capabilities (i.e., ω t f ) online leveraging contextual information. Then, a novel transcoding task assignment and viewer association algorithm is designed to maximize the network utility of the transcoding system.

A. Risk-Aware Contextual MAB
Since the aim of the edge-assisted transcoding system is to select a number of edge transcoders per time slot to maximize the cumulative network utility, this problem can be modelled using a bandit framework, where the arms are the edge devices and the rewards are the transcoding outcomes of selected edge devices.
We model the transcoding outcome of a task assigned to edge device f in time slot t as a random variable, denoted f ) is large, the transcoder can still perform poorly even with high transcoding quality. This is unaffordable and risky.
The risk is defined related to the performance fluctuations of the transcoder devices. In particular, the high risk represents the case when the transcoder has a large performance variation (i.e., transcoding outcomes with high variance). For instance, choosing a transcoder with high uncertainty can lead to unacceptable transcoding delay and severely deteriorate the viewer experience. Besides, choosing a more risky transcoder can lead to frequent transcoding task switches, which means the same transcoding task will be assigned to different edge transcoders and results in high communication overhead and playback latency. These problems reflect the importance of considering the performance uncertainty of transcoders.
Such a risk can occur due to the unexpected unavailability of transcoders. This happens in edge-assisted crowdsourced live streaming since transcoder devices considered in our work are assumed to be edge viewers' devices, which are not specifically employed for live video transcoding. The risk can also originate from the unstable computational and transmission resources of the edge transcoders. Particularly, a sudden high transmission error or a low transmission rate can aggravate the riskiness of edge transcoders as well. Therefore, we model the transcoder selection problem as a risk-aware MAB for which the objective is to balance the tradeoff between maximizing the expected value of returned transcoding outcomes and minimizing the variance of the transcoding outcomes. In particular, we define the transcoding capability as the mean-variance measure of each transcoder, which can be written as where ρ > 0 is the risk-tolerance factor introduced to balance the tradeoff between a high reward and a low risk. This linear combination of the transcoding quality and the variance of the transcoding outcome in fact defines the transcoding capability of edge device f . To learn the transcoding capabilities of each edge transcoder online, in the following, we propose an index-based MAB algorithm by analytically driving the UCB of γ t f and σ 2 f .

1) Contextual UCB for Transcoding Quality:
The transcoding quality (γ t f ) of an edge device is dependant on various factors such as the device computational power, network conditions, online stability etc. Such factors will form the contextual information of a device as a transcoder. For example, since the viewer devices are not specifically implemented for video transcoding and can switch offline during transcoding [6], online stability of a device would affect the transcoding quality. Besides, the computational power and network condition of an edge device can also affect the transcoding outcome by incurring varying latency which can be experienced at the viewer end. Therefore, we model the transcoding quality of a transcoder as the linear combination of its contextual information and a vector of unknown coefficients θ * . Consequently, the transcoding quality can be represented as where x t f represents the z-dimensional contextual information of edge transcoder f , and θ * denotes the z-dimensional unknown coefficients which can be treated as the weight of each contextual information. In addition, the number of the contextual information types is defined as z.
Collecting samples of the transcoding outcomes and the contextual information through task assignments over time, the unknown coefficients θ * can be learned. Learning the coefficients belong to a linear regression problem and it can be solved by ridge regression [41], which adds L2 regularization to the lost function. Ridge regression is a suitable technique to solve the linear regression problem when the number of samples is highly limited, which fits the situation of the crowdtranscoding system, since the samples are collected in an online form and there are only limited samples in the early stage.
Define R t as the set of transcoding outcomes till time slot t, with the number of transcoding outcomes as m t . Let W t be a design matrix of dimension m t × z whose rows correspond to the observations of contextual information of m t transcoding outcomes till time slot t and columns correspond to the z types of the contextual information. According to [42], we can acquire the estimated coefficientsθ t by ridge regression as where I z is the z-dimensional identity matrix. Let define the estimated transcoding quality asγ t f . Based on the learned knowledge of the unknown coefficients, we can update the estimated transcoding quality using the contextual information asγ For the UCB of the transcoding quality (γ t f ), according to [43], for any κ > 0, with the probability of at least 1 − κ/T , the deviation between the estimated transcoding quality and the real transcoding quality can be upper bounded by where A t = I z + (W t ) W t and φ = 1 2 ln 2T F κ . This UCB can help to estimate the real transcoding quality of each transcoder, which holds with a high probability.
2) UCB for Variance: In order to estimate the UCB of the transcoding outcome's variance (σ 2 f ), we first define the empirical variance (s t f ) 2 of the transcoding outcome of edge device f until time slot t as wherer t f represents the empirical mean of transcoding outcome until t, t f (d) represents the time slot when the d th transcoding outcome of device f is observed, and τ t f denotes the number of times that transcoder f has been chosen till t.
Fact 1: Let X be a Gaussian random variable with variance σ 2 . Define the empirical variance over n samples as s 2 n , based on [44] we have where v n = (n−1)s 2 n χ 2 1−α,n−1 and χ 2 1−α,n−1 is the upper 100α percentage points of the chi-square distribution with (n − 1) degrees of freedom.
Fact 1 gives a definition of the confidence interval of the variance of a random variable when the variable follows the Gaussian distribution. It implies that there is a probability of 100(1 − α)% that the constructed confidence interval based on the sample variance will contain the true value of σ 2 .
According to Fact 1, a UCB of the variance of a random variable is proposed when the variable follows the Gaussian distribution. Therefore, based on Fact 1, we can derive the UCB of σ 2 f under assumption that r t f is normally distributed. Lemma 1: Given the UCBs of both the mean and the variance of the transcoding outcomes based on (14) and (16), the contextual Gaussian risk-aware UCB (CGRA-UCB) of transcoding capability of the transcoder f can be written as where The first term in the RHS of (17) represents the UCB of the transcoding quality and the second term reflects the variance of the transcoding outcome.
In (17), τ t f represents the number of transcoding tasks which is assigned to transcoder f till time slot t and (s t f ) 2 denotes the empirical variance of the transcoding outcome of transcoder f at time slot t.
The bound in (17) is designed under the assumption that r t f follows an independent Gaussian distribution. However, when the reward distribution is unknown, the confidence interval presented in (1) is not pertinent. To overcome this limitation, we utilize the asymptotic distribution of the empirical variance to drive a confidence interval without any prior assumption of the reward distribution. Fact 2: Let X be a continuous random variable with mean μ, variance σ 2 , and μ 4 = E (X − μ) 4 . According to [45] and [38], the asymptotic distribution of the empirical variance is Based on Fact 2, in the following lemma, we develop an asymptotic UCB on the variance.
Lemma 2: Applying Fact 2, define the UCB of the reward variance as v upper n , for a sufficiently large n, an asymptotic confidence interval of the variance can be derived as where v Proof: See Appendix A. Since μ 4 is unknown, an estimate of μ 4 is required. We approximate μ 4 asμ n 4 = 1 n n d=1 (r d −μ) 4 , where r d represents the random reward.
When the distribution of r t f is unknown, by setting n = τ t f , we can estimate the UCB of σ 2 f according to Lemma 2, which can be calculated as v upper

Lemma 3:
Given the UCBs of both the mean and the variance of the transcoding outcomes based on (14) and Lemma 2, we can build a new contextual asymptotic risk-aware UCB (CARA-UCB) of transcoding capability, which can be written asω

B. Transcoder Selection Algorithm
With the learnt transcoding capability and based on either (17) or (21), to assign the transcoding tasks to the edge devices which are expected to return relatively high reward and are less risky, we need to solve an instantaneous version of the optimization problem P P P in (9). The instantaneous optimization problem at time-slot t can be formulated as The instantaneous optimization problem in (22) will be solved whenever new bounds of transcoding capabilities of edge devices are available. After the task assignment, the UCB estimations can be updated based on the observed transcoding outcomes. However, in order to learn the transcoding capability, simply assigning one transcoding task to different edge transcoders in different time slots is not efficient, because assigning a transcoding task to different transcoders frequently can lead to unaffordable task-switching costs and further increase the communication overheads.
To deal with this problem, according to (12), we noticed that whichever transcoder is selected can contribute in collecting information about the coefficients vector θ * , thus can further guide the learning process of the transcoding capability. Therefore, instead of determining the task assignment and viewer association in each time slot, we investigate an epoch-based sampling strategy which means a transcoding is consistently assigned to the same edge device for a finite number of time slots (which is referred as an epoch) and the task reassignments are only proceeded once at the beginning of each epoch. With this strategy, we can greatly reduce the switching costs while keep learning the transcoding capability.
The effectiveness of the epoch-based sampling depends on a well-designed epoch length [46], which is supposed to increase as time continues. Given τ t f as the task assignment counter of a edge device f , define F t as the set of edge devices whose assigned task numbers are more than a certain threshold till time slot t, which can be written as with ζ > 0 is the threshold factor. Define the smallest counter as the length of an epoch till time slot t can be calculated as where > 0.
The detailed risk-aware contextual transcoding task assignment and viewer association algorithm is described in Algorithm 1. Since the time slots have been divided into epochs, we only need to solve the optimization problem per epoch. Thus, the computational cost can be greatly reduced.
In addition, according to the optimization problemP P P, multiple transcoding tasks can be assigned to the same transcoder. However, the computational resources of the transcoders are limited and performing excessive transcoding tasks on one transcoder concurrently can lead to soaring transcoding delay and exhaust the bandwidth resources. Therefore, to avoid overwhelming the transcoders, every time the optimization variable ,y,f,j is calculated, a self-inspection process is executed at every selected transcoder and any transcoder assigned with excessive tasks will offload these tasks to the cloud data center for transcoding.

C. Computational Complexity Analysis
The computational complexity of the proposed algorithm in each epoch consists of two parts. The first part of complexity originates from the ridge regression where matrix inversion and multiplication are introduced. The computational complexity of this method scales as O(z 2 m t ), where z is the dimension of context space and m t is the number of transcoding outcomes till time slot t. Since the dimensionality of the context information is assumed to be fixed, m t will dominate the computational complexity and the complexity only grows linearly in terms of the number of transcoding outcomes.
The second part of complexity comes from solving the instantaneous optimization problemP P P (22) which is a 0-1 linear integer programming problem. According to [47], the computational complexity of such a problem is O(2 L kL) where L is the number of optimization variables and k is the number of constraints. In our problem, we have L t = IF Y J t and k = J t + F + 2IJ t where J t , Y , F , and I represent the numbers of viewers, representations, edge devices, and broadcasters, respectively. Combing both parts, the computational complexity at the time slot t is O(z 2 m t + 2 L t kL t ).

A. Simulation Setup
We test the proposed algorithm with a synthetic data set, which is based on the real-world settings. We assume a live stream transcoding system with 4 broadcasters, 50 viewers, 15 edge devices and 4 representations. The viewer count of each broadcaster live stream is decided by its popularity. The popularity is modelled by Zipf distribution which is normally used for video content popularity modelling (e.g., [48]). We set the original live video rate and the representation rates according to the twitch broadcaster settings [49]. The original rates for four broadcasters are set as 4000kbps, 2500kbps, 1500kbps and 500kbps. The specific bit rates of the four representations are set to be 400kbps (240P), 1200kbps (480P), 2000kbps (720P), and 3500kbps (1080P). Moreover, we randomly set the viewer capacity in the range of [500, 4000] kbps.
Based on the system model, a transcoding task can be assigned to multiple edge devices to serve different viewers. Since the transcoders are at network edge which is close to the viewers, the edge devices and the viewers are assumed to be distributed in a 1000 meters × 1000 meters region, and their locations are randomly determined following a uniform distribution.
According to [6], the transcoding capability can be affected by transcoder's computational power. Besides, since the candidate viewers can also undertake transcoding tasks and the viewers with low stability can be offline during transcoding, the online stability of the transcoders should be considered as a factor of transcoding capability. Based on [7], the online stability is generated by sampling the Pareto distribution. As a result, we choose the CPU mark, the average CPU usage, the average RAM usage and the online stability as the contextual information of edge transcoders.
As discussed in [9], the transcoding delay is calculated based on the required computational resources of a task and the available CPU cycles (determined by the CPU mark and usage) of the transcoder. For the transmission delay, it can be divided into two parts as discussed in Section III − C. The broadcaster-transcoder delay is randomly set in the range of [200, 300] ms according to [14], and the transcoder-viewer delay is set in the range of [0, 100] ms depending on the distance between a viewer and a edge device. To be more specific, this delay can be calculated as the distance between transcoder and viewer times 100 ms. In other words, this setting implicitly considers the average channel gain and transmission rate which are functions of the distance between a viewer and an edge device. To demonstrate the risk-awareness of the designed algorithm, the variance of an edge transcoder follows a uniform distribution within the range of [0, 1]. Finally, to test the performance of the designed algorithm, both Gaussian and Gamma distributions are simulated to generate the transcoding outcomes. The parameters are uniformly selected.
The LinUCB algorithm [50] which utilizes the contextual information to estimate the reward is simulated as the Transcoding quality and variation given varying risk-tolerance factor ρ.
benchmark. In addition, the MV-UCB algorithm [51] which is a risk-aware MAB algorithm, is also simulated for comparison. Table III describes the features of the proposed algorithm and the benchmarks. Figure 2a presents the initial transcoding quality and the variance of 15 transcoders. By sorting the transcoders in the descending order in terms of the transcoding capability with varying risk-tolerance factor ρ, Figures 2b-2d are generated. The risk-tolerance factor is set to decrease from 3 to 0.1. In Figure 2b, a larger risk-tolerance factor makes the transcoding quality dominate the transcoding capability, which represents a risk-neutral setting of transcoders. In Figure 2d, the risk-tolerance factor is set to be close to 0, which leads to a pure-risk setting. We can observe that the transcoding capability of each transcoder can be quite different. In Figure 2c, we set ρ = 1, which leads to another order of transcoders. In this figure, both transcoding quality and the variance can contribute to the capability, and the transcoders which are relatively more capable are quite different from both previous

B. Numerical Results
We first evaluate the proposed algorithm under the Gaussian distribution scenario as compared to the benchmarks. In Figure 3, the network utilities per time slot achieved by all three algorithms are presented. It is shown that the proposed algorithm using CGRA-UCB outperforms the LinUCB and MV-UCB because it not only utilizes the contextual information to learn the transcoding quality but also considers the uncertainty of the transcoder's transcoding outcome. In addition, the cumulative network utilities are depicted in Figure 4.  This also confirms the superiority of the proposed algorithm in comparison with the benchmarks. Moreover, in Figure 5, the cumulative switching costs are presented and the proposed algorithm achieves up to 85.1% cumulative switching costs as compared to the benchmarks. This is because the proposed algorithm does not solve the optimization problemP P P per time slot so that the task assignment and viewer association will not change frequently.
In Figures 6, 7, and 8, Gamma distribution is used to generate the transcoding outcomes. In this case, we simulate for 200 time slots since the used bound (derived in Lemma 3) is an asymptotic bound and its accuracy increases as more transcoding outcomes are collected, which takes longer time to converge. From the results, we can find that after 150 time slots, the proposed algorithm with the CARA-UCB tends to converge and shows good performance. The results demonstrate that the proposed algorithm using CARA-UCB achieves a higher network utility while reducing up to 86.8% switching costs as compared to the benchmarks. The experienced average latency per viewer of each algorithm is presented in Table V. According to the results, we can find that the average delays of the proposed algorithm are slightly higher in both scenarios, although still within the delay threshold. The delay thresholds in both simulations are set to 1.3 seconds, which demonstrates that the proposed algorithm can improve the network utility of the transcoding system while satisfying the delay constraint.
The running time of the proposed algorithm with and without the designed epoch-based strategy is presented in   In order to study the impact of the epoch-based sampling strategy, ζ is changed to generate different thresholds based on (23), which will help to determine the length of epoch according to (24) and (25). In Figure 9, both the cumulative switching costs and the total network utilities versus ζ are presented. We can observe that by decreasing ζ, the network utility can be increased at the expense of higher switching costs, since when ζ is decreased, the epoch length will increase more slowly and the optimization problem P P P in (9) will be solved more frequently. This figure demonstrates the importance of selecting a proper ζ to balance the tradeoff between the switching costs and the network utility.
In Figures 10, 11, and 12, three different epoch length determination strategies are evaluated in the Gaussian reward  setting. The proposed transcoder selection algorithm calculates the epoch length based on the smallest task assignment counter of the edge transcoder via (23)- (25). As benchmarks, two more cases are simulated using the average of counters and the largest counter to calculate the epoch length, respectively. The results reveal that all three strategies perform well and show fast convergence thanks to the proposed refined UCBs of the transcoding capability. Particularly, the proposed strategy based on the minimum counter achieves the highest network utility. This confirms that the designed strategy can efficiently identify suitable transcoders to maximize the network utility, since it can offer more chances for exploring the transcoding capability of each transcoder and help to refine the bounds more frequently, which helps to identify the most suitable transcoders efficiently. In addition, using the maximum counter to calculate the epoch length achieves competitively low switching costs since this strategy tends to increase the epoch length faster than others, which can reduce the number of task reassignments.  To further demonstrate the performance of the proposed algorithm, we present the transcoding performances with and without the knowledge of the transcoding capabilities of edge transcoders under the Gaussian distribution scenario. In Figure 13, the cumulative network utilities and switching costs are presented. This result shows that the proposed scheme can learn the transcoding capability quickly and achieve a highly competitively network utility as compared to the case when the transcoding capability is known. In addition, with known transcoding capabilities, a lower switching costs can be achieved since the suitable transcoders can be quickly identified and the transcoding task assignment will not be changed frequently.
Finally, we have compared the proposed edge-assisted transcoding algorithm with the Top-N scheme which is a currently-running cloud transcoding scheme in Twitch.TV. Top-N offers N premium broadcasters with the ABR service but only the basic representation rate is available for the rest of the broadcasters. Normally N is determined based on the broadcaster's popularity. The network utility and cumulative  Since Top-N is based on the cloud transcoding, we assume it can ensure highly stable viewer QoE and we set the transcoding capability to be 1 which is higher than the most capable edge-transcoder (whose transcoding capability is 0.496). In addition, we set the unit cost of transcoding to be 10 times the cost of edge transcoding. Based on the results, we can find that as N increases, the utility of cloud transcoding increases. In particular, the proposed edge-assisted transcoding algorithm can utilize edge computing resources efficiently and achieve highly competitive network utility.

VII. CONCLUSION
In this paper, we proposed an edge-assisted transcoding task offloading algorithm for CLSP, considering the contextual information and the risk of performance variations of the edge devices. First, an optimization problem was formulated to solve the transcoding task assignment and viewer association problem under the assumption of known transcoding capabilities of edge devices. Then, two risk-sensitive bandit algorithms are developed to deal with the exploration-exploitation dilemma and to learn the transcoding capabilities. An epochbased assignment strategy was introduced to reduce the switching costs of transcoding task assignment. Numerical results based on various settings confirm that the proposed risk-aware contextual algorithms can achieve superior performances as compared to different benchmark schemes that are either contextual or risk-sensitive. In a nutshell, leveraging both contextual awareness and risk sensitivity can improve resilience and robustness of an online task offloading scheme. In future, we will extend the proposed algorithm by considering a larger scale problem with extremely high live video quality (such as 4K and 8K) and with different behaviors of edge devices.