Selection of Robust and Relevant Features for 3-D Steganalysis

While 3-D steganography and digital watermarking represent methods for embedding information into 3-D objects, 3-D steganalysis aims to find the hidden information. Previous research studies have shown that by estimating the parameters modeling the statistics of 3-D features and feeding them into a classifier we can identify whether a 3-D object carries secret information. For training the steganalyzer, such features are extracted from cover and stego pairs, representing the original 3-D objects and those carrying hidden information. However, in practical applications, the steganalyzer would have to distinguish stego-objects from cover-objects, which most likely have not been used during the training. This represents a significant challenge for existing steganalyzers, raising a challenge known as the cover source mismatch (CSM) problem, which is due to the significant limitation of their generalization ability. This paper proposes a novel feature selection algorithm taking into account both feature robustness and relevance in order to mitigate the CSM problem in 3-D steganalysis. In the context of the proposed methodology, new shapes are generated by distorting those used in the training. Then a subset of features is selected from a larger given set, by assessing their effectiveness in separating cover-objects from stego-objects among the generated sets of objects. Two different measures are used for selecting the appropriate features: 1) the Pearson correlation coefficient and 2) the mutual information criterion.


I. INTRODUCTION
D ATA hiding has many applications, including intellectual protection, marketing, storing information for contextual use, and so on. Data domains used for steganography or data hiding include audio, video, and images [1]- [4]. Steganalysis aims to identify the information which was hidden into a specific medium. It relies on knowledge resulting from extracting features and analyzing them as well as on algorithms used for information forensics and computational intelligence. Various approaches have been proposed for audio and image steganalysis [5]- [9]. Steganography and information hiding into 3-D graphics have known a rapid expansion during the last decade [10]- [14], and the interest in this area will grow stronger, given the development of 3-D printing and its implications for the manufacturing industry and medicine among others. For the information embedded by a generic steganographic or information hiding algorithm, the stego-objects and cover-objects are visibly indistinguishable from each other. 3-D steganalysis aims to detect the changes embedded in shapes and graphical models and can be seen as a classification problem which aims to distinguish the stego-objects, which carry hidden information from the cover-objects, representing the original objects. However, this is a classification of very subtle changes in the 3-D shapes, which raises new challenges. Existing 3-D steganalytic algorithms extract certain features from a large number of cover-stego pairs, representing 3-D objects before and after hiding the information into their surface [15]- [17]. The parameters characterizing the statistics of these features are then used as inputs for a machine learning algorithm aiming to discriminate the stego-objects from cover-objects. In this paper, we assess the robustness of 3-D steganalyzers in the context of the cover source mismatch (CSM) problem. The CSM problem is represented by the realistic scenario that the objects used for training a steganalyzer may be originated in a cover source which is different from the one used by the steganographier for hiding the information [18]. CSM in the area of image steganalysis, was addressed during the break our steganographic system (BOSS) contest [19]. The mismatch between the training and testing sets caused many difficulties to the participants in this contest [19]- [21]. In general, the CSM problem in the image domain was addressed by considering the following aspects: the training sets used, the relevant feature set to be extracted from the images, and the machine learning methods used for steganalysis.
In the case of digital images, the generalization ability of the steganalyzers is tested for images characterized by various ISO noise levels and for different JPEG compression quality factors, [22], [23]. In the context of BOSS contest, Gul and Kurugollu [21] proposed to use the correlation between a feature and the embedding rate as the criterion of feature selection (FS). Meanwhile, Pasquet et al. [24] proposed to use the ensemble classifier enabled with an FS mechanism. The FS relies on evaluating the importance of each feature in the learning process [25]. A feature condensing method, called calibrated least squares, was proposed in [26]. A method to mitigate the CSM due to changes in the features of the cover image was described in [27]. This approach normalizes the cover features for all steganographers by subtracting 2168-2267 c 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
the centroid of their joint distribution. Other research studies addressing the CSM problem in images aim to find a classifier that would be robust to the variations between the training and testing data. In [28], it was shown that simple classifiers, such as the Fisher linear discriminant (FLD) ensemble and the online ensemble average perceptron, have a better performance than more complex classifiers, when faced with the CSM problem. To mitigate the mismatch due to various changes in stego features, Xu et al. [23] and Ker and Pevnỳ [27] used ensembles of classifiers which would increase the weight of those steganalytic features robust to changes. In the machine learning community, the methods of domain adaptation [29]- [32] and transfer learning [33], [34] have been studied in the case of various data distributions, aiming to enforce the consistency of the model information at the target domain, given a source domain. This scenario is very similar to the CSM scenario in steganalysis. Pan et al. [29] proposed a feature extraction method, called transfer component analysis (TCA), for domain adaptation. TCA aims to minimize the maximum mean discrepancy between samples of the source and target data in a Hilbert kernel space. Long et al. [34] proposed transfer joint matching (TJM), which improves TCA by jointly considering both feature matching and instance reweighting. Zhang et al. [32] proposed a unified framework that reduces the shift between domains both statistically and geometrically, referred to as the joint geometrical and statistical alignment (JGSA).
In this paper, we propose the robustness and relevance-based FS (RRFS) algorithm for addressing the CSM problem in 3-D steganalysis. By locally distorting the surfaces of objects used for training, we extend the set of shapes. A subset of features is selected from among a larger set, based on their ability to generalize, among the enlarged database of shapes, when they are used as inputs to the steganalyzer. In this paper, besides using the Pearson correlation coefficient (PCC) [35], we also propose using the mutual information criterion (MIC) in order to evaluate the relevance of each feature to the class label. Moreover, we introduce a parameter in order to control the tradeoff between the features' relevance and robustness during the FS. The proposed methodology is tested on the Princeton mesh segmentation project database [36] and compared to six FS algorithms and three domain adaptation approaches, when considering the 3-D information hiding algorithms proposed in [10]- [12]. 3-D steganalysis is briefly described in Section II, while the proposed method addressing the CSM problem in the context of 3-D steganalysis is explained in Section III. The experimental results are provided in Section IV, and the conclusions of this paper are drawn in Section V.

II. 3-D STEGANALYSIS
The processing stages for 3-D steganalysis are shown in the diagram from Fig. 1, and they consists of the stages of training and classifying into cover-objects, representing original objects and stego-objects, representing objects where information was hidden. After a preprocessing stage, using surface smoothing and 3-D object normalization, aiming to enhance the features characteristic to the embedded information, a set of local features, characterizing the differences between 3-D shapes, are extracted from sets of 3-D objects representing both coverobjects and stego-objects. The first four statistical moments of such features are then considered as feature vectors for training a machine learning algorithm. The 3-D steganalysis approach proposed in [15] uses the feature set YANG208, which includes the norms in the Cartesian and Laplacian coordinate systems [37], the dihedral angles of the triangular surface faces, and the face normals, among other features. Yang et al. [38] proposed a new steganalytic algorithm, specifically designed for the mean-based watermarking algorithm from [11]. Li and Bors proposed the feature set LFS52 in [16], which includes the local curvature and vertex normals as steganography features, while dropping some of the other features used in [15]. Li and Bors then extended this feature set to LFS76 in [17] by adding new features such as the vertex position and the edge length in the spherical coordinate system of 3-D objects.
A very important issue, which is essential for all pattern recognition approaches, consists of the ability of the steganalyzer to generalize from the training set to completely different objects. The CSM problem in 3-D steganalysis addresses the robustness of steganalyzers to be trained using a set of cover and stego 3-D objects characterized by certain properties and then being able to identify the stego-objects when tested on a set of objects with different surface properties. The generalization ability of existing steganalyzers is rather poor because the steganalytic features are sensitive to the changes of the local geometrical and topological properties of objects. So the separation boundary for the classification of cover-objects and stego-objects, calculated from a specific cover source, may not be optimal for the stego-objects originating from other cover sources, resulting in a poor accuracy for the steganalyzer. In addition, we need to make it clear that the degradation of the steganalysis results under CSM is not because of the mismatch of the objects' global shapes. The steganalytic features are designed to be sensitive to the embedding changes, which are rather small in order to be invisible, and meanwhile nonsensitive to the global shapes of the objects.
In this paper, we propose an FS stage in order to select the robust and relevant features for training the steganalyzer in order to address the CSM problem for 3-D steganalyzers. The proposed methodology, whilst removing some of the redundant features, considers only those features that enable an appropriate generalization from the training set to the wider space of stego-and cover-objects. During this stage, we increase the diversity of the objects by considering local perturbations on the surface of objects. Such perturbations consist of mesh simplifications and noise additions, and these would result in changes of the geometrical characteristics of the cover sources, generating objects which are quite different from the original ones when considering the local surface properties. These are then considered as cover-objects and used for hiding information through steganography resulting in sets of stegoobjects. The changes produced to the surface of objects would result in statistical changes of the features used for steganalysis in both cover-objects and stego-objects. Then, in this paper, we propose to use an RRFS algorithm in order to select the best set of features for addressing the CSM problem, whilst removing the redundant features.
During the testing stage, after using the same preprocessing steps, the selected sets of features are extracted from the testing objects. Finally, the steganalyzer decides whether any information is embedded in the given object based on the statistics of the selected features. The quadratic discriminant [37] and the FLD ensemble [16] have been used as machine learning methods for discriminating the cover-objects from stego-objects.

III. ROBUSTNESS AND RELEVANCE-BASED FEATURE SELECTION ALGORITHM
In the following, we consider that we have a set of 3-D objects O, used for training a steganalyzer. We consider a data hiding algorithm for embedding information into the surface of these 3-D objects, representing cover-objects, resulting in stego-objects. A set of features is then extracted from both cover-and stego-objects, and the parameters characterizing their statistics are then used as inputs in a machine learning classifier to distinguish between the two classes of objects. The research studies from [15]- [17] have found several 3-D features as being useful for 3-D steganalysis. However, the sensitivity of these 3-D features to the variation in the shape of the objects being analyzed varies from feature to feature. The steganalytic features that are more sensitive to the embedding changes contribute more to the performance of the steganalyzer. Nevertheless, the values of these features would have a significant variation, outstripping their characteristic estimated distributions, when diversifying the cover source shapes. This ultimately leads to the degradation of the steganalyzer's performance under the CSM scenario. The solution of this dilemma in the CSM scenarios would be to find a tradeoff between the features' sensitivity to the embedding changes and their robustness to the variation of the cover source. This is the motivation of the following FS method, addressing the CSM problem in 3-D steganalysis.
The proposed FS algorithm, called RRFS, presents a mechanism for choosing the features which will guarantee the steganalysis performance in the CSM scenarios. The key idea of the proposed algorithm is to find the features that are more robust to the variation of the cover source, while preserving a relatively high sensitivity to the embedding changes, which is evaluated by their relevance to the class label. Naturally, two criteria are considered during the selection: the relevance of the features to the class label and the robustness of the selected feature set to the variation of the cover source. This FS algorithm belongs to the category of filter methods [39], shown to be efficient when used for selecting input features in various machine learning algorithms. The filter methods are suitable to be applied in the CSM situations because they can avoid overfitting to the training data whilst being characterized by a better generalization during the testing stage [40].
In the proposed algorithm, the relevance of the features to the class label is estimated by using the PCC, calculated between the distribution of each feature and the corresponding objects' classes where x i is the ith feature of a given feature set, X = {x i |i = 1, 2, . . . , N}, where N is the dimensionality of the input feature, y is the class label indicating either a cover-object or a stego-object, cov represents the covariance, and σ x i is the standard deviation of x i . The PCC can capture the linear dependency between the features and the label, with |ρ(x i , y)| = 1 indicating a high degree of linearity while ρ(x i , y) = 0 indicates a scattered dependency [41]. All features are ranked according to their relevance to the class label, calculated using (1), in descending order as where In the following, we also consider the MIC as a statistical measure of the relevance between each feature and the class label. MIC is known as a statistical measure of dependency between two variables. The mutual information between the ith feature, x i , and the class label, y, is given by where p(x i , y) is the joint probability distribution function of x i and y, and p(x i ) and p(y) are the marginal probability distribution functions of x i and y, respectively. Compared to the correlation coefficient, the mutual information is considered to be better in measuring the nonlinear dependency between the variables [42]. MIC was used in other FS methods, such as [43]- [46]. The robustness of features to the variation of the cover source is related to solving the CSM problem. Ideally, robust features should model the statistical characteristics that distinguish cover-objects and stego-objects even when these are different from those used during the training. In this paper, we assume that the dataset used for tests is different from that used in the training, through some transformations which are controlled in the experimental setting of this paper. If the features of the objects do not change much after applying various transformations to the cover-objects, they would be expected to provide similar steganalysis results to those achieved for the original cover-objects and stego-objects. Such features would have a strong robustness in the context of steganalyzers. In the following, we consider changes to the surface of the objects, such as by mesh simplification and by adding noise, and compare the features extracted before and after such changes. We do not consider remeshing or surface fairing because such operations would result in excessive smoothing and the subsequent embedding modifications will be more easily detected. Then, the PCC of the feature sets extracted before and after applying the changes to the 3-D objects is calculated as where i = 1, 2, . . . , N and ρ(x i , x i,j ) is evaluated in (4). The RRFS algorithm starts considering a preset number of N features as input. These are considered as those which have been proposed for 3-D steganalysis in the previous studies [15]- [17]. The RRFS algorithm aims to find the most N relevant features which have relatively strong robustness to be used for a steganalyzer that addresses the CSM problem. N features are selected after multiple iterations through the given set of features, which are ranked each time according to their relevance, calculated by using either the (1) for PCC or (3) for MIC. During each iteration, a subset of features F , with the highest relevance, is selected subject to the following conditions: where θ q represents the threshold for the correlation corresponding to the qth percentile of set {r i |i = 1, 2, . . . , N}, evaluating the robustness of the features, and #(·) represents the cardinality of a given set. Those features that are not robust enough are removed, and the RRFS algorithm then reiterates, considering the subset F containing the features that fulfil the Algorithm 1: RRFS-PCC Algorithm Input: Features extracted from the cover-objects and stego-objects used for training X = {x i |i = 1, 2, ..., N}; Features extracted from other cover-sources, obtained by a shape generation procedure, and their corresponding stego-objects X j = {x i,j |i = 1, 2, ..., N, j = 1, 2, ..., M}; Class label y; Step size parameter ; Dimensionality of the selected feature N . Output: Index of the selected feature subset F . 1 Compute the relevance of the features to the class label, Compute the robustness of the features to the variation of the cover source, Sort the features by relevance |ρ(x i , y)| in the descending order and get the index  16 Return F ; required conditions. The tradeoff between the robustness and the relevance of the features is controlled by a parameter . Initially, q is set as 100 − . After each iteration, if the cardinality of selected features #(F ) < N , then we reduce the threshold to a value corresponding to a percentile of q− and repeat the FS by considering a new threshold θ q− instead of θ q . When the parameter is closer to 0, the FS algorithm tends to select features which are more robust. If increases, then it will be more likely for the features with higher relevance to be selected by the RRFS algorithm. In this way, with each iteration, we add additional features to the selected set of features such that whilst increasing the feature set, we preserve the generalization capability of the steganalyzer. Since the features are ranked according to their relevance in descending order, the features with higher relevance are first selected if their robustness is above the threshold θ q . After each iteration, the threshold θ q is gradually reduced, considering lower percentiles of q − instead of q, until the dimensionality of the selected features becomes equal to N . These N selected features are robust enough to the variation of the cover source whilst having a relatively high relevance to the class label at the same time. The setting of the parameter is investigated in Section IV-B. The RRFS algorithm that uses PCC as the measure of the features' relevance to the class label is named RRFS-PCC, and its pseudocode is provided in Algorithm 1. Instead of using PCC, the RRFS-MIC algorithm uses MIC to calculate the features' relevance, as defined in (3). The description of RRFS-MIC is similar to that of Algorithm 1.

IV. EXPERIMENTAL RESULTS
In the following, we provide the assessment of the proposed methodology for addressing the CSM in 3-D steganalysis. We consider 354 3-D objects represented as meshes which are part of the Princeton mesh segmentation project [36] database, which is a combination of the databases used by the European research projects, AIM@SHAPE, 1 FOCUS K3D, 2 and the shapes from the Watertight Models Track of the Shape Retrieval Contest 2007 [47]. This database contains a large variety of shapes, representing human bodies under various postures, statues, animals, toys, tools, and so on.
The stego-objects are generated by applying three information hiding algorithms: 1) 3-D multilayers steganography (MLS) proposed in [10]; 2) the blind robust watermarking algorithms based on modifying the mean of the distribution of the vertices' radial distance coordinates in the spherical coordinate system, denoted as MRS, from [11]; and 3) the Steganalysis-resistant watermarking (SRW) method proposed in [12]. In the case of MLS [10], the number of embedding layers is considered as 10 and the number of intervals is chosen as 10 000. The relative payload ratio of each layer is nearly 1, with three vertices used for extracting the code which are not modified at all. The payload embedded by MRS from [11] is 64 bits, and the watermarking strength is 0.04. We set the parameter K = 128 in SRW [12] and the upper bound of the embedding capacity as (K−2)/2 . Similar to the approach from [16], we consider FLD ensembles [48], [49] as the machine learning-based steganalyzer. The parameters for the FLD ensembles, such as the number of the base learner and the subspace dimensionality, are chosen as in [49]. The classifiers' performance is measured by the detection errors which are the sums of false negatives (missed detections) and false positives (false alarms).
The initial feature set considered is a 276-D feature set, called LAY276, generated by combining two feature sets used for 3-D steganalysis, LFS76 [17], and YANG208 [15], counting only once the eight features which are present in both sets. We consider the initial objects of the database as cover-objects, and we obtain the stego-objects following watermarking. In the experiments, the feature set LAY276, is initially extracted from the cover-objects and stego-objects. Then, for testing the proposed method under the CSM scenario, we apply certain transformations, such as by adding noise or by mesh simplification, to the original objects from the database, and we consider the transformed objects as cover-objects for information hiding. Feature sets are extracted from these transformed cover-objects and their corresponding stego-objects. We consider four levels for each transformation considered, going from superficial changes to more dramatic modifications applied to the surfaces of the objects, by either increasing the level of noise, through a parameter β, representing the weight applied to the amplitude of noise, or by changing the mesh simplification factor λ, which represents the percentage of polygonal faces preserved after mesh simplification. Thus, during the calculation of the robustness, we have a set of M = 8 transformations applied to the original objects. In order to test the performance of the selected features in the context of the CSM scenario, we randomly select 260 cover-objects from the original cover source and the corresponding stego-objects for training the steganalyzer. The steganalyzers are trained over the feature subsets selected by the RRFS algorithm. Then we test the steganalyzer on the other 94 pairs of cover-objects and stego-objects originated from the transformed cover sources, which have not been used during the training. The experiments are repeated ten times with independent splits for the training and testing sets.

A. CSM Scenario
In the following, we analyze the steganalysis capability, when hiding information by means of three different information hiding algorithms. We consider both situations for a steganalyzer, when under the CSM scenario and without it. In the case when testing the steganalytic algorithm, without considering the object transformations for the CSM scenario, we utilize the whole LAY276 feature set. For diversifying the cover source space, in order to test the steganalyzer under the CSM scenario, we consider two different transformations: 1) mesh simplification and 2) noise addition. While the first transformation changes the local topology of the mesh, the latter one alters the roughness of the surface. For example, these transformations can simulate the distortions of the meshes caused by using different 3-D scanners when scanning the same object because the 3-D scanners may have different accuracies and precisions, and then we may use different algorithms to create the 3-D meshes. When considering 3-D printing of watermarked objects, these will also contain variations on their surface similar to those created by additive noise. When creating new shapes by considering additive noise to the mesh surface of original objects, we actually create a challenging problem for a 3-D steganalyzer because such distortions resemble those produced to the mesh when hiding information. Thus, we would actually increase the uncertainty in separating the cover-objects from stego-objects.
The mesh simplification is performed using the MATLAB function reducepatch 3 which reduces the number of faces, while aiming to preserve the overall shape of the 3-D object. The level of simplification is controlled by the parameter λ ∈ {0.98, 0.95, 0.9, 0.8} which is interpreted as a fraction of the original number of faces. For example, if λ = 0.8, then the number of the faces is reduced to 80% of their count from the original mesh. The close-up detail of one of the original 3-D objects used in the experiments is shown in Fig. 2(a), while its corresponding stego-object obtained by using the MLS embedding algorithm after mesh simplification by a factor of λ = 0.8 is shown in Fig. 2(b). If we would have chosen smaller λ values, the resulting meshes would have been dramatically changed, while addressing the CSM problem in 3-D steganalysis is about localized changes in the mesh surface. Besides, the mesh simplification algorithm used by reducepatch may produce particular artifacts; for example, it may result in the effect that the sizes of the triangles on the flat part of the simplified mesh would vary significantly. When considering uniform noise addition, its amplitude is modulated by the parameter βD, with β ∈ {1·10 −5 , 2·10 −5 , 3·10 −5 , 5·10 −5 }, and D is the maximum distance between the projections of any two vertices on the first principal axis, obtained by applying the principal component analysis on the original 3-D object. The application of the noise relative to the size of the objects ensures a consistent effect for all the generated shapes, which does not depend on their initial size. With the application of various levels for the mesh simplification and noise addition, we can observe the performance of the steganalytic approaches under different levels of CSM scenarios. Fig. 3 depicts the box plots for the detection errors, indicating their variation from the mean, for the three information hiding algorithms without CSM (label 0) and with CSM for labels 1-8, where the diversity of objects for testing the CSM problem is produced by shape transformations through adding noise, or by mesh simplification, each by considering four levels of induced distortions to the original shapes. We remark that in the case without CSM (label 0), the training set did not contain the noisy or the simplified meshes. From Fig. 3(a), it can be observed that the CSM scenario poses more challenges to steganalysis when the changes are embedded by the MLS steganographic algorithm, proposed in [10], than in the case of the MRS [11] and SRW [12] algorithms, whose results are provided in Fig. 3(b) and (c), respectively. With respect to MRS and SRW, the CSM challenge due to the diversification of shapes through mesh simplification leads to a fall in the detection of the hidden information. However, from these results, it can be observed that the CSM challenge due to the diversification of shapes through additive noise would not have much influence on the detection results. This happens because the added noise to the cover-object surface is actually smaller than the changes produced to the surface of 3-D objects by these two watermarking algorithms.

B. Analysis for Selecting the Parameter in the RRFS Algorithm
The parameter controls the tradeoff between the robustness and the relevance of the features during selection, as explained in Algorithm 1 from Section III. In the following experiment, we consider the steganalysis of stego-objects carrying the information embedded by the MLS algorithm, proposed in [10], whose steganalysis results were the poorest when considering the CSM scenario in the previous section. We set ∈ {2, 10, 20, 30, 40, 50}, considering the same rules as above when using the RRFS-PCC algorithm for the FS. When is small, the algorithm gives more consideration to the robustness of the features to the variation of the cover source, while when is larger, it gives more consideration to the feature's relevance to the class label. Because we consider that the robustness of the feature is very important for addressing the CSM problem, we tend to set a small value for . We consider increasing the number of selected features N from 10 to 270, with steps of 10 at each iteration. From Fig. 4, we can observe that the threshold θ q decreases when increasing , which controls the tradeoff between the robustness and relevance. A larger area under the plot of the threshold of robustness θ q means more consideration is given to the features' robustness. For example, when = 2, the selection of the features is mostly based on their robustness to the variation of the cover source.
For testing the steganalyzers, under the CSM scenario, we consider four shape alterations produced by additive noise with the amplitude given by β ∈ {1 · 10 −5 , 3 · 10 −5 }, and by mesh simplification, considering the level of smoothing as λ ∈ {0.98, 0.9}. From the plots in Fig. 5, it can be observed that in the case of CSM due to noise addition, smaller values, such as ∈ {2, 10, 20}, lead to a better performance of the steganalyzer. However, the steganalyzers with ∈ {30, 40} show better generalization ability when the CSM is due to mesh simplification. Since we have to consider the CSM scenarios due to both noise addition and mesh simplifications, we set = 20 as a tradeoff solution in the following experiments.

C. Comparison With Other Feature Selection Approaches
In the following, we compare the proposed FS algorithms for steganalysis, RRFS-PCC and RRFS-MIC, with filter FS algorithms used in pattern recognition, such as minredundancy and max-relevancy [46], double input symmetrical relevance (DISR) [50], conditional mutual information maximization [44], infinite FS (Inf-FS) [51], and infinite latent FS (ILFS) [52], which have shown very good generalization ability in a wide range of applications [53]. In addition, we also compare with a simplified version of our algorithm, relevance-based FS (RFS), which selects the features with higher relevance to the class label, measured by PCC, but without considering the robustness to the variation of the cover source. We repeat the steganalysis experiments, using  [11]. (c) Results for SRW [12]. FLD ensembles for ten different splits of data sets and then consider the median of the resulting errors as the final test results. Figs. 6-8 show the test results when using features selected by the proposed RRFS-PCC and RRFS-MIC algorithms compared with the other six FS algorithms. These results are obtained when considering the initial set of features as LAY276 for steganalysis under the CSM assumption, by considering the distortions caused by mesh simplification and uniform additive noise as in the previous section. Fig. 6 shows the detection errors for the 3-D steganography MLS, proposed in [10], under the CSM scenario. As it can be observed from this figure, RRFS-PCC and RRFS-MIC algorithms achieve rather similar results, which indicates that the dependency between the 3-D features and the class label is relatively linear. Meanwhile, when compared to the other FS algorithms, RRFS-PCC and RRFS-MIC show better performance when the dimensionality of the selected features is within the range between 10 and 60. As the dimensionality of the selected features increases, the advantage shown by RRFS-PCC and RRFS-MIC decreases, eventually being surpassed by Inf-FS and ILFS in the CSM due to additive noise. However, Inf-FS and ILFS do not provide good results in the CSM due to mesh simplification. The optimal feature dimensionality found for each CSM case, when considering a different distortion, is not always consistent with each other. Meanwhile, when considering the same type of distortion, of various intensities, for addressing the CSM problem, the resulting optimal feature dimensionalities are rather consistent with each other. For example, in the CSM due to additive noise, the minimum errors are often obtained when the dimensionality of the selected feature is between 20 and 40. Nevertheless, in the CSM due to mesh simplification, the optimal value for N is usually between 60 and 160. This is due to the fact that mesh simplification produces more significant shape changes than additive noise, consequently the resulting shapes requiring more features for steganalysis. Figs. 7 and 8 illustrate the steganalysis results when considering the watermarking methods, MRS and SRW, proposed in [11] and [12], respectively, under the CSM scenario. When considering the CSM due to additive noise, most of the FS algorithms show similar performance. As the dimensionality of the selected feature space increases, the detection error decreases until it eventually becomes stable. This happens because the steganalyzers are not significantly influenced by the CSM due to additive noise when identifying stegoobjects produced by MRS and SRW, which is validated by the results shown in Fig. 3(b) and (c). RRFS-PCC and RRFS-MIC algorithms show better performances than the other algorithms under the CSM scenario due to mesh simplification. In Fig. 7(e)-(h), the detection errors using RRFS-PCC and RRFS-MIC are relatively constant, when the dimensionality of the feature subset N is in the range between 20 and 160. However, in Fig. 8(e) and (f), the detection errors when using RRFS-PCC and RRFS-MIC achieve the minimum first when N is around 50. Then, the second minimum is likely to be obtained when N = 130. The first minimum is obtained for the most robust features, while the second minimum is produced because the newly added features have higher relevance than the first batch of features being considered. The fluctuation of the steganalysis error rate, when N increases, is probably due to the linear dependencies and redundancy among the selected feature subsets. According to Figs. 6-8, the results achieved by the RRFS algorithm are better than those of the RFS, indicating that the robustness of the features to the variation of the cover source is essential when addressing the generalization of the steganalyzer under the CSM scenario.
In the following, we provide the receiver operating characteristic (ROC) curves for the steganalysis results in the CSM scenarios after applying the FS algorithms or without considering FS, in Figs. 9-11. In these experiments, we identify the stego-objects produced by MLS under the CSM scenario when generating new cover-objects through additive noise with amplitude defined by β ∈ {1 · 10 −5 , 3 · 10 −5 }. Since the steganalysis results of MRS and SRW tend to be rather poor under the CSM due to mesh simplification, we consider the CSM scenarios due to mesh simplification at the levels of λ ∈ {0.98, 0.9} when detecting the stego-objects produced by MRS and SRW. When the steganalysis is carried out without FS, the whole feature set, LAY276, is used for training. In this case, we consider various FS algorithms, such as DISR, Inf-FS, and ILFS, which have shown relatively good performance on other data sets. In terms of the dimensionality of the selected feature subset, N , for all the FS algorithms, we set N ∈ {40, 50, 90}, when detecting the stego-objects produced by MLS, MRS, and SRW, respectively. The value of N is decided from the overall performance of the proposed FS algorithms in all CSM scenarios according to the results provided in Figs. 6-8. It can be observed from Figs. 9-11 that the proposed RRFS-PCC and RRFS-MIC algorithms show better performance than the other FS algorithms when considering the CSM scenario. Moreover, the proposed algorithms show improvement in the 3-D steganalysis results, in the context of CSM scenario, when compared to using the whole feature set. This indicates that 3-D steganalyzer's generalization is improved when selecting a suitable feature set by the proposed RRFS-PCC and RRFS-MIC algorithms.

D. Comparison With Domain Adaptation Approaches
In this section, we compare the proposed RRFS-PCC and RRFS-MIC with several domain adaptation approaches, such as TCA [29], TJM [34], and JGSA [32]. These domain adaptation methods have been proposed in order to address the mismatch between the training and testing data in the context of various applications, such as text classification, face recognition, object recognition, indoor WiFi localization, and so on. For the proposed FS algorithms, we set N ∈ {40, 50, 90}, the same as with the previous experiments, when detecting the stego-objects produced by MLS, MRS, and SRW, respectively. The same values are considered for the domain adaptation methods, used for comparison. We keep the step size parameter of RRFS-PCC and RRFS-MIC, = 20, according to the conclusions of the study from Section IV-B. We consider TCA and JGSA with linear kernel, while we consider a radial basis function kernel for TJM, and their specific parameters are chosen according to the research studies from [29], [32], and [34]. We provide the ROC curves for the steganalysis results in the CSM scenarios after applying the proposed FS algorithm, domain adaptation algorithms, or without considering  15. Accumulated selection ratios of the features, as being discriminative between the stego-objects, created by using the embedding methods from [10]- [12] and their corresponding cover-objects, when using RRFS-PCC, under the specific CSM scenarios. In (a)-(c), the features correspond to the first four moments of the features they characterize, such as the mean, variance, skewness, and kurtosis. (a) Statistical categories of selected features in the case when the information was embedded by MLS [10]. (b) Statistical categories of selected features in the case when the information was embedded by MRS [11]. (c) Statistical categories of selected features in the case when the information was embedded by SRW [12]. (d)-(f) Correspond, respectively, to geometrical categories of selected features when the information was embedded by MLS [10], geometrical categories of selected features when the information was embedded by MRS [11], and geometrical categories of selected features when the information was embedded by SRW [12]. The meaning of the category labels in (d)-(f) are: 1) vertex position in the Cartesian coordinate system; 2) vertex norm in the Cartesian coordinate system; 3) vertex position in the Laplacian coordinate system; 4) vertex norm in the Laplacian coordinate system; 5) face normal; 6) dihedral angle; 7) vertex normal; 8) curvature; 9) vertex position in the spherical coordinate system; and 10) edge length in the spherical coordinate system. and RRFS-MIC algorithms provide better performance than the domain adaptation algorithms, TCA, TJM, and JGSA. Moreover, the domain adaptation approaches show worse performance than the case when not using the FS in the CSM scenarios, as shown in Figs. 13 and 14. The ineffectiveness of the typical domain adaptation approaches is due to the fact that the feature transformations applied by the domain adaptation algorithms may reduce the distances between the features from cover-and stego-objects, while they are designed to reduce the distances between the training and testing samples. In consequence, the typical domain adaptation approaches are not suitable for addressing the CSM problem in 3-D steganalysis.

E. Analyzing the Selection of Various Categories of 3-D Features
In the following, we analyze the contribution of various categories of features that are selected by the proposed RRFS-PCC algorithm in the CSM scenarios of 3-D steganalysis. First, we categorize the steganalytic features according to their characteristics, as being either statistic or geometrical in nature. For the former category, we consider grouping the features according to the statistical moments they define, as: mean, variance, skewness, or kurtosis. In the latter group, we categorize the features by considering what kind of local geometry characteristic they would reveal: the vertex position in the Cartesian coordinate system, the vertex norm in the Cartesian coordinate system, the vertex position in the Laplacian coordinate system, the vertex norm in the Laplacian coordinate system, the face normal, the dihedral angle, the vertex normal, the curvature, the vertex position in the spherical coordinate system, and the edge length in the spherical coordinate system. For each of these feature categories, we calculate the percentage of the features being selected by the RRFS-PCC from the given pool of features when training the steganalyzers aiming to find the information hidden in 3-D objects by the algorithms, MLS [10], MRS [11], and SRW [12], under the CSM scenario where the 3-D shape domain had been extended through the additive noise and mesh simplification. The final selection ratio of every feature category is calculated as the average of ten independent splits of the training/testing data. Fig. 15 depicts the selection ratios of all feature categories when the feature subset selected by the RRFS-PCC algorithm contains a number of features which varies from 10 to 270 with a step of 10, in the context of mitigating the CSM problem. As it can be observed from Fig. 15(a)-(c), when N is small, the first-order moments (means) of features are more likely to be selected than their second-order moments (variances) or than other higher-order moments of the features, such as their skewness and kurtosis. It can be observed that the differences between the selection ratios of different feature categories decline as N increases. This result indicates that the first-order moment features are the most robust and then the variance is the second most important statistical feature, when considering the context of the CSM scenario. The higher-order moments of the features considered for 3-D steganalysis are more dramatically changed than the lower-order ones under the transformations considered for testing the CSM problem.
The selection ratios of the features for the ten geometrical categories are shown in Fig. 15(d)-(f). It can be observed from Fig. 15(d)-(f) that the RRFS-PCC algorithm would primarily select the curvature features (label 8) which implies that these features have the strongest robustness and relatively high relevance. This is because the curvature features model best the features of 3-D shapes within the 1-ring neighborhood of a given vertex and if one or more adjacent faces of the vertex are distorted, the other adjacent faces may average the effect and limit the influence of change on the curvature. Features originated from the vertex position (label 3) and the vertex norm (label 4) in the Laplacian coordinate system, as well as those originating from the vertex norm (label 2) in the Cartesian coordinate system, are selected during the early stage of the FS, but their selection ratios remain stable for quite a while until N increases to 100, which indicates that the features with strong robustness are probably scattered among the various categories of geometrical features. According to Fig. 15(d), the features originated from the face normal (label 5), dihedral angle (label 6), and vertex normal (label 7) are not selected until N reaches 60. Similar results are shown in Fig. 15(e). We infer that the face normals are seriously distorted by the additive noise and mesh simplification because even when one vertex from a face is modified, the face normal would be changed as well. Since the dihedral angle and the vertex normal are calculated based on the face normal, they are unavoidably influenced by the transformations applied to extend the 3-D shape domain. It is interesting to observe that the dihedral angle features (label 6) have higher selection ratios in Fig. 15(e) and (f) than those in Fig. 15(d) during the early selection stage, when N ranges from 20 to 60. This indicates that when steganalysis is used on the 3-D objects carrying information embedded by various information hiding algorithms, the robustness of the features may vary significantly. With respect to the features characterizing the spherical coordinate system (labels 9 and 10), these features are also likely to be selected during the early stage of the FS algorithm and their selection ratios would significantly increase when N exceeds 80.

V. CONCLUSION
This paper proposes a solution for the CSM problem in the context of 3-D steganalysis. According to the CSM scenario, we consider that the objects investigated during the testing stage are different from those used for training the steganalyzer. A new FS algorithm, called RRFS, is proposed in this paper. Two versions of the algorithm are discussed, the first employs the PCC, while the second uses the MIC in order to define the relevance of each feature to the class label. The robustness of the features to the variations of the cover source is evaluated by the RRFS algorithm, resulting in the selection of a feature subset which is relevant and robust to the shape variation. In order to diversify the 3-D shape variation, we consider mesh simplification and additive noise for generating new cover and stego-objects when testing the steganalyzer under the CSM scenario. During the experimental analysis, we consider three different information hiding methods, including a high capacity embedding method and a more recent method which embeds watermarks that cannot be detected by other steganalytic methods. The proposed methodology is shown to choose a better feature set than those selected by other FS algorithms proposed in the machine learning literature, when they are used for 3-D steganalysis in the context of CSM scenarios. Meanwhile, it also achieves better performance than several typical domain adaptation approaches when used in the CSM scenarios. A limitation of this study is that for selecting the features, we consider a rather limited set of transformations when simulating the CSM problem. A more general study should compare the set of cover-objects with a set of transformed objects originated from completely different cover sources than those initially used in the training stage. Moreover, it is challenging to find an optimal feature set size, which would work well for identifying stego-objects, embedded by any given information hiding algorithm, under the conditions of the CSM scenario.