Market Development, Information Diffusion, and the Global Anomaly Puzzle

Abstract Previous literature finds anomalies are at least as prevalent in developed markets as in emerging markets; namely, the global anomaly puzzle. We show that while market development and information diffusion are linearly related, information diffusion has a nonlinear impact on anomalies. This is consistent with theoretical developments concerning the process of information diffusion. In extremely low-efficiency regimes, without newswatchers sowing the seeds of price discovery and ensuring the long-run convergence of price to fundamentals, initial mispricing and subsequent correction will not occur. The concentration of emerging countries in low-efficiency regimes provides an explanation to the puzzle.


I. Introduction
It is generally believed that emerging/developed markets should have more/ less return anomalies. This is based on the premise that developed markets are more efficient and efficiency should lead to less return anomalies (Butler and Malaikah (1992), Bekaert and Harvey (2002), Van der Hart, Slagter, and Van Dijk (2003), and Bris, Goetzmann, and Zhu (2007)). In contrast to this widely held belief, Griffin, Kelly, and Nardari (2010), among others, document that anomalies appear to be at least as prevalent in developed markets as in emerging markets. 1 More recently, Jacobs (2016) revisits this topic using mispricing scores to test more anomalies and provides further support to the finding of Griffin et al. (2010). Though Jacobs (2016) tests a variety of conjectures, he can find no clear explanation for the crosssectional variation in the presence of anomalies (we refer to this as the global anomaly puzzle).
The purpose of this study is to address the global anomaly puzzle by reconciling the relationship between market development and market efficiency with the observed presence of anomalies. Building on Hong and Stein's (1999) theoretical model, the core of our explanation is that market development affects the production and use of "price" relevant information that is a necessary condition to start the pricing process based on the information. 2 In general, information efficiency is measured by the speed information is incorporated into price (Griffin et al. (2010)). In a cross-sectional setting the differences in the amount of information made known to the market, the quality of information and the users' ability to analyze and interpret information will all contribute to this speed of information incorporation. We, therefore, use the concept of "newswatcher efficiency" to capture the multiple dimensions of an information environment that may ultimately affect this speed of price discovery. A higher newswatcher efficiency suggests more information production, higher quality of information, and better analysis skills. It is generally believed the higher the newswatcher efficiency, the less will be the return anomalies (see, e.g., Hong and Stein (1999)). What has been less explored in the literature, however, is what happens if newswatcher efficiency is extremely low. Without newswatchers sowing the seeds of price discovery and ensuring the long-run convergence of price to its fundamental value with respect to that information, the process of initial mispricing and subsequent correction will not occur. Therefore, anomalies will not be observed.
As a market develops, however, newswatcher efficiency is improved by the introduction of more sophisticated investors and more high-quality value-relevant information. Nonetheless, information diffusion is still relatively slow among newswatchers in this early stage of market development. This leads to an information-induced price trend and underreaction and offers an opportunity for momentum traders to trade on the price trend and eventually over extrapolate to create mispricing. Therefore, we will observe a positive relationship between market development and anomalies. We refer to this as Phase I, the "increasing/emerging" phase of the anomaly-market development relationship.
As market development progresses newswatcher efficiency is further improved. With a high level of information quality and investor sophistication, price-relevant information is reflected in the price more quickly and this discourages momentum traders and, therefore, return anomalies are reduced. Therefore, there will be a negative relationship between market development and anomalies in this relatively higher efficiency regime. We refer to this as Phase II, the "decreasing/ developed" phase of the anomaly-market development relationship. This is the phase more often studied in the literature with data from developed markets such as the USA and Europe.
Overall, we expect that while market development and newswatcher efficiency are linearly related, newswatchers will have a nonlinear impact on return anomalies. Therefore, our central hypothesis is: market development (via newswatcher efficiency) has an inverted U-shape relationship with the observed presence of anomalies. We show that our argument of a nonlinear relationship is consistent with the theoretical framework of Hong and Stein (1999) in an extended numerical simulation.
One of the common challenges is to measure the information environment empirically when studying information efficiency in the international context (Griffin et al. (2010)). In Hong and Stein's (1999) theoretical model and Griffin et al.'s (2010) illustrative example, the source of the difference is captured by the speed of information diffusion. However, the outcome of information diffusion is influenced by the general information environment and it is difficult to identify one single measure that can capture the cross-country variation. Conceptually, we use "newswatcher efficiency" to capture the multiple dimensions of the information environment that may ultimately affect the speed of information diffusion. Through searching for proxies for the speed of information diffusion in different countries we learned there are broadly three aspects of the information environment that matter: the amount of information made known to the market, the quality of information, and the users' ability to analyze and interpret information. Empirically, to test our hypothesis we capture these aspects through the eight proxies used in the article.
Specifically, we use the number of news articles (NEWS) as a measure of the availability and coverage of public information (from Griffin, Hirschey, and Kelly (2011)). We use the accounting standard index (ACCT; La Porta, Lopez-de-Silanes, Shleifer, and Vishny (2000)) and the earnings management score (EMS; Leuz, Nanda, and Wysocki (2003)) to measure cross-country differences in accounting quality. An opacity index (OPA) measures the general information opaqueness. As for the ability to interpret information, we use a sophistication score (SOPHI, Global Competitiveness Report) and institutional ownership (INSTOWN). Finally, we have two variables that measure both the quality and the ability to interpretanalyst dispersion (DISP) and differenced volatility (DV, volatility difference in earnings announcement and nonannouncement days, Griffin et al. (2011)). These variables measure information and earnings quality.
Empirically we test the relationship between newswatcher efficiency and anomalies in several steps. First, we identify 16 anomalies whose mechanism is close to Stein's (1999), (2007) framework. 3 To examine the return of these anomalies in aggregation, we construct a mispricing score using these 16 anomalies in 45 countries during the period from 1993 to 2016. 4 The long-short spread (hedge return) of the mispricing portfolio in emerging markets does not outperform that in developed markets and this is consistent with Jacobs (2016). When the variance of the hedge return is considered (a Sharpe ratio type measure), there is even evidence that developed markets anomalies are greater than those of emerging markets, especially in the equal-weighted and accounting-based anomaly portfolios.
Second, we then show there is a nonlinear relation between newswatcher efficiency and the anomaly return. Panel regressions show the quadratic terms of these newswatcher proxies are significant and negative in relation to the hedge return of the mispricing portfolio. This confirms our central hypothesis of nonlinearity (an inverted U shape). Our results are robust to control variables for limits to arbitrage and investment frictions, the potential time-varying newswatcher efficiencies, and the Fama and French global five factors.
Third, further examination of newswatcher efficiency and hedge returns shows emerging (developed) markets are around or on the left (right) of the turning point of a nonlinear curve. Because both low and high newswatcher efficiency drives low anomaly returns, there is no significant difference between developed and emerging markets. This provides an explanation of the global anomaly puzzle.
We conduct a series of further tests and robustness checks. In terms of further analysis, we test our nonlinear prediction of newswatcher efficiency at both the firm cross-sectional and time series levels. We show that the anomaly return of small firms in developed markets is significantly larger than for those in emerging markets, while there is no difference between the two markets for large firms. This finding of the firm size effect interacting with market development provides further support to our main argument about the importance of newswatcher efficiency in the production of anomalies. In general, the information environment is better developed than in emerging markets. However, such a difference is greatest between relatively smaller size firms in the two types of market. In other words, small firms in emerging markets have relatively low information efficiency. In contrast, large companies in emerging countries have equivalent coverage by analysts and investor attention is similar to large companies in developed markets (Griffin et al. (2010)).
Extending our argument about the existence of newswatchers as a sufficient condition for the observations of anomalies, we would expect return anomalies to be even less observable in frontier markets. There are only three frontier markets that meet our original data criteria. This suggests anomalies are less likely to be constructed reliably given the lack of publicly available data in most of the frontier markets. Nevertheless, as an "out of sample" test we relax the data selection criteria further for frontier markets, which enables us to add another nine frontier countries to the sample. As expected, when we compare this group's anomaly returns, they are, in general, lower and less significant when compared to the other two types of market. When comparing the percentage of countries having significant anomalies there is a clear difference between the three types of market.
In the time series context, we test the evolution of anomalies with market development in the U.S. market where there is a relatively long history of data covering a wide range of market development. While the disappearance of anomalies in recent samples is known to the literature, what is relevant to this study is we further show that the time variation of anomalies is also linked to time-varying newswatcher efficiency in the USA. 5 In addition, to alleviate the possibility that our finding is driven mainly by the inclusion of the U.S. market, we repeat our main analysis excluding the USA in the Supplementary Material and find the nonlinearity between market development and anomalies remains strong.
Our study extends the literature in the following ways. First, our study reconciles the relationship between market development, return anomalies, and market efficiency that appears to be puzzling in the existing literature (Griffin et al. (2010), Jacobs (2016)). Our study emphasizes that given the mix of investors (newswatchers and momentum traders), relative newswatcher efficiency (information production and usage) plays a key role in affecting price discovery. This finding has a strong implication for the relationship between return anomalies and market efficiency. Market efficiency cannot necessarily be associated with the absence or otherwise of anomalies; a market with fewer anomalies could be a reflection of fast information diffusion, low information asymmetry, less market frictions, and biased investors, or it could simply be there is insufficient information and not enough sophisticated investors to obtain and process price-related information in the first place.
The idea of a potential nonlinear relationship between the speed of information incorporation and anomalies is not new. Griffin et al. (2010) illustrate a similar intuitive idea about the nonlinear relationship between the speed of information incorporation and absolute autocorrelations in a two-period example. However, their illustration is a reduced form exercise, which explicitly specifies a return equation and a parameter for the speed of information incorporation. While such a simple setting is useful in describing their idea, it is unclear whether and how the return equation can be reached in equilibrium, and what type of investors can generate such a pricing process. In particular, it does not consider the interaction between noise traders and informed traders which is an important dynamic in the (mis-) pricing process as demonstrated in the early theory of De Long, Shleifer, Summers, and Waldmann (1990). We fill in the gap by extending Hong and Stein's equilibrium model into this context which creates a richer description of the anomalygenerating process. One of our key arguments that is subtly different from Griffin et al. (2010) is that building on Hong and Stein (1999) newswatchers not only react to information but also produce information (price-trend) that is consumed by other market participants. The resulting observed pricing process is obtained through the interaction of different players and their use and production of information.
On the empirical front, we take up the challenge highlighted by Griffin et al. (2010) to construct multidimensional measures of newswatcher efficiency and provide empirical evidence of the nonlinear relationship between anomalies and market development measured by newswatcher efficiency. We empirically demonstrate that emerging markets mainly lie on the left-hand side (Phase I) of the curve while developed markets lie on the right (Phase II). And this helps to explain the lack of difference in the observed anomalies level. Our study provides a clear mapping of the complex relationship between market inefficiency measures (anomalies) and market developments. The use of newswatcher efficiency bridges the conceptual gap between the information environment (production and processing) and the observation of anomalies. It provides a uniform framework to understand pricing anomalies and market developments.
Second, our study contributes to a wider and ongoing debate about how, when, and why anomalies exist (Richardson, Tuna, and Wysocki (2010), Subrahmanyam (2010), and Lochstoer and Tetlock (2020)). The search for conditions that lead to anomalies has focused on the conditions that affect the limits to arbitrage such as liquidity, transaction costs, and short selling (Chordia, Subrahmanyam, and Tong (2014), Chu, Hirshleifer, and Ma (2020)), and on the roles of sophisticated investors such as institutional investors or hedge funds (Cao, Liang, Lo, and Petrasek (2018), Calluzzo, Moneta, and Topaloglu (2019)). Utilizing a cross-country setting, we add to this line of literature by showing the importance of investor mix and newswatcher efficiency as important (nonlinear) determinants of anomalies. Limits to arbitrage and the short-sale constraint alone cannot explain why there is no difference in the level of anomalies in emerging and developed markets or the nonlinear relationship between anomalies and market development. This is because the existing limits to arbitrage or behavioral bias explanations of anomalies would suggest the phenomenon should be stronger in emerging markets, as there will be higher limits to arbitrage, less development and use of short-sell instruments and more behavioral bias.
Finally, we provide new insights about the differential impact of market development on different sizes of companies in the market. Particularly, we find the information environment gap is greatest between emerging and developed markets in small-size companies. Large companies in emerging markets have an information environment closer to their developed peers. We further clarify the relationship between firm size and anomalies (Hong, Lim, and Stein (2000)). We highlight that the smallest firms do not have more anomalies than the next small size group of firms in developed markets. This does not contradict market efficiency as small firms have very low newswatcher efficiency that inhibits the existence of anomalies. Consistent with this, emerging market small firms have even lower newswatcher efficiency than developed markets and, therefore, have even less anomalies.
The remainder of the article is organized as follows: Section II develops our main hypothesis in the context of Hong and Stein (1999) and introduces the key concept of newswatcher efficiency. Section III describes the global data set we have used and provides a summary of the (lack of) difference in anomalies between emerging and developed markets. Section IV presents our core analyses of the relationship between newswatcher efficiency and anomalies. Section V reports further tests and robustness checks. Section VI offers conclusions.

II. Hypothesis Development and Newswatcher
Efficiency Measures

A. The Hong and Stein Model Revisited in an International Context
To study cross-country differences, we need a theory to explain the asset pricing process that leads to the observed pricing pattern/"anomalies" and then study the differences in the identified drivers across countries. There are two broad categories of behavioral models. One literature focuses on modeling the impact of specific behavioral biases. For example, the models by Barberis, Shleifer, and Vishny (1998) and Daniel, Hirshleifer, and Subrahmanyam (1998) assume prices are driven by a single representative agent prone to a small number of cognitive biases (conservatism, representativeness, or overconfidence). The other literature focuses on the heterogeneity of information. For example, Hong and Stein (1999) propose a general model that focuses on the interaction between heterogeneous agents and avoids direct reference to any specific behavioral bias. In the context of making a comparison across countries, market development has a close relationship with a country's investor mix. Therefore, the search for an explanation of the global anomaly puzzle leads us to explore potential variations in different countries' investor mix to elucidate cross-country differences. Hong and Stein's (1999) model belongs to the family of "extrapolating" models that can explain a wide range of anomalies (see Barberis, Greenwood, and Shleifer (2018) for a recent survey). Such a general set of disagreement among investors is one of the key building blocks in developing a general unified behavioral model of both under-and over-reaction phenomena (Hong and Stein (2007)). For example, in this study, we identified 16 anomalies that can be interpreted within Stein's (1999), (2007) theoretical framework.
In this framework, mispricing can be generally described via a four-stage process as depicted in Graph A of Figure 1. In the Hong and Stein (1999) model, there are two types of boundedly rational agents: newswatchers and momentum traders. i) The specialists, known as newswatchers, "dig out" information and reflect such information in their trading. Each newswatcher observes some private information but they fail to extract other newswatchers' information from prices. The consequent underreaction means momentum traders can profit by trend chasing. ii) The price trend created by the newswatchers gives the early momentum traders the opportunity to trade based on the price trend. iii) The extrapolation by later momentum traders will eventually overcorrect the initial underreaction and create another type of mispricing (overreaction). iv) In the long run as new information is released (e.g., future earnings announcements) the mispricing will be resolved and a price correction (reversal) occurs. In general, momentum type anomalies are described by the stages 1 to 3, whereas reversal type anomalies are described by stages 3 and 4.
In an international context, market development would lead to different levels of newswatcher efficiency. Graph B of Figure 1 shows the difference in the price discovery process as newswatcher efficiency varies. As Hong and Stein (1999) point out, the very existence of underreaction by newswatchers sows the seeds for overreaction by making it profitable for momentum traders to enter the market.
In markets where there is a general lack of newswatchers, there is an insufficient critical mass to create the price trend for momentum traders to follow. This price discovery process is represented by the blue-dotted line. As a market develops, however, newswatcher efficiency is improved by the introduction of more sophisticated investors and more high-quality value-relevant information. Nonetheless, information diffusion is still relatively slow among newswatchers in this early stage of market development. This leads to an information-induced price trend and underreaction and offers an opportunity for momentum traders to trade on the price trend and eventually over extrapolate to create mispricing. Therefore, we will observe a positive relationship between market development and anomalies. Information Diffusion and Anomalies Figure 1 illustrates the key building blocks in the Hong and Stein (1999) model and overlays a hypothetical price movement for a piece of good news that is value relevant. Graph A describes the four different stages of the price discovery process with the presence of newswatchers, early and late momentum traders that leads to a pattern of mispricing covering two types of anomaly: underreaction and overreaction. Graph B demonstrates the nonlinear relationship between three levels of newswatcher efficiency/information diffusion speed and the observed anomalies.

News
Newswatcher Price trend In other words, the price discovery process will move from the blue-dotted line to the gray line in Graph B of Figure 1. We refer to this as Phase I, the "increasing/ emerging" phase of the anomaly-market development relationship. As market development progresses newswatcher efficiency is further improved from medium to a much higher level, price relevant information is reflected in the price more quickly and this discourages momentum traders and, therefore, return anomalies are reduced. This is demonstrated in Graph B of Figure 1 as the process changes from the gray line to the red-dashed line. Therefore, there will be a negative relationship between market development and anomalies in this relatively higher efficiency regime. We refer to this as Phase II, the "decreasing/developed" phase of the anomaly-market development relationship. This is the phase more often studied in the literature with data from developed markets such as the USA and Europe.
The above intuition can be summarized by an extended numerical analysis of the Hong and Stein model. In this model, the main parameter is z, which captures the inverse of the information diffusion speed. z can be interpreted as the number of periods it takes for a piece of information to be fully diffused across the newswatchers. The smaller is z the faster the information diffusion and hence a better newswatcher efficiency. There are two further parameters: the standard deviation of news shocks e, and the momentum traders' holding period j. Given a set of parameters for z, e, and j, the model can be solved numerically for the momentum traders' prediction coefficient phi, which is similar to a positive feedback coefficient. In this framework, we can define an anomaly as an observation of a price process that exhibits short-term underreaction and subsequent overreaction. The parameter capturing underreaction is z, while the parameter capturing overreaction is phi.
We extend Hong and Stein's original numerical analysis by considering a wider range of the information diffusion parameter z; especially when z becomes very large (i.e., information diffuses very slowly). Figure 2 presents plots of comparative statics with respect to the information diffusion parameter z. We use a set of parameters similar to Hong and Stein's (1999) analysis in their Table A3; in particular, when the momentum trader's holding period is set at j = 12. Graph A shows the plot of the momentum intensity (phi) and z, while Graph B shows the plot of the standard deviation of the pricing error (Pt À Pt*) and z.
In an ideal (fully efficient) world, z is zero as newswatchers comprehend the news shock immediately and trade to fully reveal the new information in the price. There is no trend for momentum traders to chase and, therefore, no short-term return continuation and long-term reversal in the market (i.e., no anomaly). When z starts to increase from zero, the news will be gradually assimilated by newswatchers and revealed in price. Momentum traders will start to chase the price trend and cause the price to overshoot. As long as j ≥ z À 1 (i.e., the momentum traders' holding period is longer than the period of information diffusion among newswatchers), momentum traders believe they will profit from the trend. The higher the z, the more profit opportunities there are for momentum traders to chase the trend. Thus, there is a monotonic and positive relationship between z and momentum intensity when j ≥ z À 1, as shown in Graph A of Figure 2.
When j < z À 1, an increase in z will have two competing effects on momentum traders' trend-chasing behavior. First, similar to the case when j ≥ z À 1, a larger z brings more profit opportunities for momentum traders to chase the trend. Second, some early momentum traders will have unwound their positions before information is fully revealed in the price. This will reduce momentum trader profit and, therefore, discourage their trend chasing. When z becomes very large relative to j, the second effect will become dominant and, therefore, at some point as z increases the momentum chasing oscillates and converges to zero, as demonstrated in Graph A of Figure 2. Furthermore, Graph B of Figure 2 confirms the variations in momentum intensity lead to a nonlinear relationship between the information diffusion parameter z and the standard deviation of pricing error, which is another way to capture anomaly intensity.
How can this nonlinear relationship help our understanding of anomaly variations across the globe? The empirical prediction is that if cross-country information diffusion speed covers the whole spectrum from very slow (very inefficient markets) to very fast (very efficient markets) we should observe a nonlinear relationship between information diffusion speed and the number of observed anomalies cross-sectionally. We are interested in how a market's newswatcher efficiency, measured as the information diffusion speed (i.e., 1/z), affects anomalies. We can thus restate the relationship shown in Figure 2, in terms of information diffusion speed (1/z) and return anomalies as shown in Figure 3, which presents the Comparative Statics with Respect to the Information Diffusion Parameter Figure 2 plots the relationship between the information diffusion parameter (z) and momentum intensity (phi), and between z and the standard deviation of pricing error (Pt À Pt*), in Graphs A and B, respectively. The solution to momentum intensity is based on equation (7) from Hong and Stein (1999). The predetermined parameters are as follows: the momentum traders' horizon is 12, the volatility of news shocks is 0.5, and the momentum traders' risk tolerance is 1/3. We can summarize this relationship in two distinct phases as markets develop. Figure 3 shows when the speed of information diffusion is low (between 0 and 0.01, corresponding to z between 100 and infinity), the momentum intensity and variation in pricing errors are also low. This demonstrates the effect of newswatcher efficiency at the start of Phase I. When information diffusion is very slow, no short-term underreaction or subsequent overreaction will be observed. As the speed of diffusion increases (up to 0.036, corresponding to z = 28), the improved newswatcher efficiency leads to a general increase in momentum intensity and pricing error, though the changes are not monotonic. Anomalies are most likely to be observed in this later stage of Phase I, since there is significant short-term underreaction and significantly high momentum intensity. When the speed of information diffusion further increases (for 1/z > 0.036, corresponding to z < 28), the momentum intensity starts to decrease as the profit of momentum trading is reduced in this phase (Phase II).
Where exactly each country lies is an empirical question. However, we expect the majority of emerging markets will be concentrated in the earlier part of Phase I, while the majority of developed markets will be in the latter part of Phase I and the early part of Phase II. We expect very few, if any, developed markets to be in the later stage of Phase II, where newswatcher efficiency is very high. Hence the central prediction is the absence of some anomalies in emerging markets can be attributed to the general absence of newswatchers who would have paid attention to that particular type of news. Information Diffusion Speed and Momentum Intensity Figure 3 plots the relationship between information diffusion speed and momentum intensity. The solution to momentum intensity is based on equation (7) from Hong and Stein (1999). The predetermined parameters are as follows: the momentum traders' horizon is 12, the volatility of news shocks is 0.5, and the momentum traders' risk tolerance is 1/3.

B. Newswatcher Efficiency Measurement
The key measure affecting the extent of observed anomalies in the Hong and Stein (1999) model is the speed of information diffusion. It is, however, empirically challenging to quantify this concept, especially in the international context. Empirically, we use the concept of "newswatcher efficiency" to capture multiple dimensions of an information environment that may ultimately affect this speed of information diffusion. Specifically, there are three aspects of the measurable information environment that are relevant in this context: the availability of information, the quality of information, and the users' ability to analyze and interpret information.
First, as an input to decision making, the quantity and quality of information will have a direct impact on the speed newswatchers incorporate the relevant information. The number of news (a variable we refer to as NEWS) articles is a proxy of information production and it indicates the scope of the information set (Griffin et al. (2011)). More media coverage would, in general, improve the speed of information transmission.
Second, You and Zhang (2009) find information travels more slowly across the market when information readability is lower. They show the underreaction to 10-K reports is stronger when they are more complex. In this regard, information diffusion speed can be increased by improvements in disclosure practice. Kaniel, Ozoguz, and Starks (2012) argue higher accounting quality can also increase investor confidence, which may in turn lead to quicker reaction and faster information diffusion. Consequently, better information quality should improve information diffusion. To this end, we use the accounting standard index (ACCT) and earnings management score (EMS) to measure accounting information quality. Furthermore, the OPA is applied to measure information opaqueness. 6 Higher opaqueness tends to induce higher information uncertainty and disagreement among investors. Information is less likely to be incorporated into price immediately and completely when there is information uncertainty. Zhang (2006a), (2006b) finds information uncertainty exacerbates investors' underreaction to news. As a result, information should travel faster in less informationally opaque environments.
Third, investor sophistication and education will help to measure the crosssectional differences in investors' ability to process information. Chang, Hsieh, and Wang (2015) show less sophisticated investors tend to misreact to information. Sophisticated investors are less affected by behavioral biases (Feng and Seasholes (2005), Bhattacharya, Kuo, Lin, and Zhao (2018)); they also have an advantage in accessing more information (Bonner, Walther, and Young (2003)) and better abilities to learn from past experience (Bonner and Walker (1994)) and are more capable of processing and analyzing information (Bonner et al. (2003), Collins, Gong, and Hribar (2003), and Callen, Hope, and Segal (2005)). Institutional investors are, generally, more sophisticated investors due to their superior ability to access and process information. Therefore, markets with high levels of sophistication or higher institutional ownership should have greater information efficiency. We use a sophistication score (SOPHI) and institutional ownership (INST) to measure investor sophistication.
Finally, there are variables that capture more than one dimension. Differenced volatility (DV) is the difference between abnormal volatility around earnings announcement dates and abnormal volatility outside of earnings announcement periods, which is proposed and studied by Griffin et al. (2011). A greater difference suggests investors are likely to react to earnings information and, therefore, there is better newswatcher efficiency. Similarly, dispersion (DISP) measures the disagreement in the analysts' forecasts (Hong and Sraer (2016)). The lower the DISP suggests a better information environment (Zhang (2006a), (2006b)). Both of these measures capture the quality of information and the ability to interpret the information.
We note that while the above individual measures are not perfect, they capture different aspects of the information environment that affects the speed of information diffusion. Given the multidimensional nature of the information environment, we also try to capture the commonality of these proxies by creating an aggregate measure of newswatcher efficiency via a principal component analysis in our empirical analyses.

A. Anomalies Data and Sample
While there is still an ongoing debate on the causes of most anomalies, anomalies can be largely classified into the two categories of an underreaction or overreaction to information (Barberis et al. (1998), Daniel et al. (1998), Hong and Stein (1999), and Barberis and Thaler (2003)). Since our argument is based on Hong and Stein's (1999) theoretical framework, we start by identifying a broad set of anomalies that can be examined under this framework: anomalies that can be interpreted by the process of initial underreaction, subsequent overreaction, and eventual reversal.
This general mispricing process, illustrated in Figure 1, helps us to understand anomalies in the following way. In terms of underreaction anomalies, such as momentum and gross profit, they can generally be described by the process of stages 1-3. While for reversal anomalies such as asset growth, investment, shortterm reversal, and long-term reversal, they can be interpreted as stocks that have experienced stages 1-3 at the point of the portfolio sorting and will be experiencing a price reversal in stage 4. The sorting variables we used in identifying the anomalies are effective variables that can help us to distinguish cross-sectionally which stocks are more likely to be at stages 1-3 for momentum anomalies and at stages 3-4 for reversal anomalies. 7 We built the anomaly list based on the 11 anomalies from Stambaugh et al. (2015). We exclude two anomalies, financial distress and return on assets, due to availability of quarterly data for international markets. To mitigate anomaly selection bias, we also add some extra anomalies based on Hou, Xue, and Zhang (2015). The main difference in anomaly coverage in Stambaugh et al. (2015) from Hou et al. (2015) is the trading friction anomalies. Therefore, we include short-term reversal, long-term reversal, dollar trading volume, and maximum daily return. In addition, we also add one value-growth anomaly (the book-to-market ratio), and two investment-related anomalies (investment growth and working capital accrual). In the end, we have 16 anomalies in our cross-country studies covering those relating to investment, value premium, price momentum, reversal, and profitability. The definitions of these anomalies are given in Table 1 and the detail of their construction in Appendix A. In addition, we also propose an interpretation of each of the 16 anomalies under the Hong and Stein (1999) framework in Table 1, and in Figure 1 we show how the anomalies are related to information diffusion.
These anomalies can also be classified into accounting and market-based anomalies given the key information type used to construct the anomaly portfolios. They can be broadly considered as violating semi-strong and weak-form market efficiency.
For market coverage, we include 23 developed and 22 emerging markets and these numbers have been dictated mainly by data availability. The classification of market development is based on the MSCI classification. For USA and Canadian stocks we collect return data from CRSP and accounting information from Compustat North America. We retrieve data from Compustat Global for all other markets. For U.S. stocks we include common stocks with a share code of 10 and 11 from NYSE/NASDAQ/AMEX. For international markets, we include common shares (TPCI = 0). We have additional sample selection procedures for international markets to obtain a reasonable number of observations and to avoid data errors. First, the monthly average of the number of stocks is at least 50 stocks. Second, there are at least 240 months in that market. Third, stock returns are set to missing if they are greater than the 99.9% percentile in that market to avoid errors of extremely large returns. These criteria leave 45 international markets in our sample. Our sample period starts in June 1993 and ends in Dec. 2016 in order to have sufficient stocks at the cross-sectional level. It is comparable with other anomaly studies in the international context, for example, Griffin et al. (2010) and Jacobs (2016). A summary of the sample coverage for each market is given in Appendix B.
We construct accounting anomalies at the end of June each year using the information from the previous year; while we construct market-based anomalies at the end of each month using either monthly or daily data. To ensure the results are not driven by extreme errors, following the literature we trim anomaly variables at the 0.5% and 99.5% percentiles (Watanabe et al. (2013), Lu, Stambaugh, and in deciles 1 and 10 are more likely to have a momentum effect than those stocks in the middle deciles. Importantly, a variable that can identify a group of stocks, that is, more likely to experience stages 1-3 (overreaction) will normally be unable to identify the reversal of these stocks. This is because the reversals are not synchronized. Therefore, different variables may be better in sorting the stocks that are likely to experience reversal.  Table 1 reports a summary of the anomalies including the key papers, a short description and our potential interpretation of an anomaly under the Hong and Stein (1999) (2008) Negative relation between the asset growth rate and subsequent 1-year returns Overreaction: Cooper et al. (2008) show there is a running up of price during asset growth. This can be a reflection of newswatchers assessing the value of asset growth that creates momentum in the price and leads to mispricing (overreaction) and we observe a reversal subsequently Accounting INVESTMENT-TO-ASSETS IA Li and Zhang (2010) Firms with a lower investment-toasset ratio have higher returns Overreaction: All of these anomalies can be interpreted in a similar way to the asset growth anomaly (they are components of asset growth: either the investment or financing side - Cooper et al. 2008). These sorting variables help in identifying stocks that have experienced overreaction and are more likely to be at the end of their stage 3 of the process and, therefore, reversal is more likely  (1985), Rosenberg, Reid, and Lanstein (1985) High book-to-market ratio stocks earn higher returns Overreaction: Newswatcher and momentum traders have interacted to private information of the stock and push the price too low (high) for stocks with low (high) growth opportunity which results in a high (low) BM for those stocks and we observe a reversal in price afterward Accounting GROSS_PROFITS GP Novy-Marx (2013) Higher stock returns for profitable than unprofitable firms Underreaction: The interpretation of a high gross profit is slowly reflected in the price-induced momentum tracing. Therefore, we observe higher gross profit and higher subsequent return (continued on next page)  Ohlson (1980) and Dichev (1998) Firms with high probability of bankruptcy have lower stock returns Underreaction: In the process of evaluating distress risk, newswatchers slowly interpret the meaning of the risk and this creates momentum. Therefore, we observe higher distress risk and lower returns Market based MOMENTUM MOM Jegadeesh and Titman (1993) Firms with higher returns in the past 6 months continue to have higher returns in the following 6 months Underreaction: This is a classic example used by HS in their 1999 study. The momentum effect is a combined result of slow information diffusion among newswatchers and momentum traders. Stages 1-3 Market based SHORT-TERM_REVERSAL SR Jegadeesh (1990) Firms with higher returns in the past month have lower stock returns in the following month Overreaction: The extreme return observed in the market is driven by a process of slow (private and public) information diffusion and momentum trading as described by stages 1-3. The extreme prices are a result of market overreaction to certain news. The most extreme returns are most likely to be at the end of the momentum trading process and are more likely to reverse. Therefore, we are more likely to observe a correction in these extreme returns. Hence, higher returns lead to lower subsequent returns  (2011) Negative relation between firm's extreme daily return in the past month and stock returns in next month Market based TRADING_VOLUME DVOL Brennan, Chordia, and Subrahmanyam (1998) Negative relation between dollar trading volume and returns Diffusion of private information will induce newswatchers to trade and create momentum trading. Overall, this process will generate a relatively larger volume than otherwise. Particularly, when combining the disagreement of investors with a short-sell constraint, "A central prediction of these dynamic models is that a positive correlation exists between trading volume and the degree of overpricing" (HS 2007, p. 124). Therefore, stocks that experience large volumes are likely to be those at the end of the momentum trading stage (stage 3) and hence a reversal is more likely Yuan (2020)). For each anomaly, all the firms are divided into quintiles in each month and we require at least 30 stocks each month. Following Stambaugh et al. (2015), we construct a mispricing score to aggregate all the anomalies. The mispricing score is the average of ranks across all available anomalies in each month. Three types of mispricing scores are constructed: the accounting mispricing score using only accounting anomalies, the market-based mispricing score using only trading anomalies, and the mispricing score using all anomalies. We require at least five individual anomalies for the accounting and market-based mispricing scores. For the mispricing score using all anomalies, we require at least five accounting and at least five market-based anomalies to ensure the mispricing score is not dominated by a certain type of anomaly. Then all firms are grouped into quintiles based on the mispricing score. The long-short spread is the return from a portfolio that longs the low mispricing score stocks (undervalued stocks) and shorts the high mispricing score stocks (overvalued stocks). Our first formation month is June 1993 and the last formation month is Nov. 2016. 8 Table 2 reports long-short spreads based on mispricing scores. Panel A reports the market average results. We first compute the time-series average of long-short spreads for each market using stocks sorted by their mispricing score in the market. Then we average the spreads for all developed markets and all emerging markets. Consistent with Jacobs (2016) we find that while anomalies are significant in both emerging and developed markets, there is little difference between the two types of market in Panel A.

B. Long-Short Spreads
Panel B of Table 2 reports a variant of the analysis. The average return reported in Panel A does not take into consideration the statistical significance of the mean return. Given the concerns over p-hacking, it is important to compare anomaly returns when their standard deviations have been taken into consideration. 9 We, therefore, report standardized returns that are the time-series averages of long-short spreads scaled by their standard deviation in each market.
Panel B of Table 2 shows that after considering standard deviations, there are signs of significant differences between the two markets. This is especially the case for equal-weighted accounting-based anomalies where developed markets have a significantly larger standardized return than emerging markets. This is also true when all anomalies are considered for the equal-weighted portfolio. Combining this finding with those in Panel A suggests the mean return obtained from trading in developed markets has a lower standard deviation than in emerging markets.
A similar conclusion is reached when we examine the t value of the anomalies in the two market types in Panel C of Table 2. We report the number of countries whose return clear different hurdle rates. We apply two criteria to determine the significance. In the first approach, we consider a return to be significant if its p-value is less than 10%. In the second approach, the spread is considered as significant only if the t-value is greater than 3 (suggested by Harvey et al. (2016)). Panel C shows that, when taking into consideration statistical significance, developed markets consistently have a larger number of countries that have significant anomalieswith the exception of market-based anomalies within a valueweighted framework. For example, for the equal-weighted portfolio and using the t-value greater than 3 hurdle, accounting anomalies are evident in 17 out of the 23 developed countries, while this number is only 8 out of the 22 emerging countries.
Overall, our analysis extends the puzzle documented by Jacobs (2016). We show anomalies are not as likely to be observed in emerging as in developed markets when the standard deviation of the hedge return (or t-value) is taken into Long-Short Spreads and Mispricing Scores Table 2 reports long-short spreads based on mispricing scores (following Stambaugh et al. (2015)). Mispricing scores are based on accounting anomalies only, market-based anomalies only, and all anomalies. See the data and sample section for mispricing score construction in detail. In each month, we rank stocks into quintiles based on the mispricing score rank for each market. Panel A reports market average return results. We first compute the time-series average of long-short spreads for each market. Then we average the spreads for all developed markets and all emerging markets. Panel B reports standardized returns. For each market, we compute the time-series average of long-short spreads scaled by their standard deviations. In Panels A and B, we report equal-and value-weighted returns. The t-statistics are from two sample t-tests. ***, **, and * indicate significance at the 1%, 5%, and 10% levels, respectively. Panel C reports the number of markets with significant long-short raw returns. We apply two criteria to determine the significance. We first consider a return as significant if its p-value is less than 10%. Second, a spread is considered as significant only if its t-value is greater than 3 (suggested by Harvey, Liu, and Zhu (2016)).

Equal-Weighted
Value-Weighted consideration. Furthermore, there is evidence suggesting that developed markets have more accounting anomalies when the returns are equal-weighted. The rest of this article dissects this puzzle through a cross-country comparison and a time series analysis of newswatcher efficiency.

IV. Analyses of Newswatcher Efficiency and Anomalies
The data on newswatcher proxies are from various sources. The sophistication score (SOPHI) is from the Global Competitiveness Report (2006-2014) of the World Economic Forum. Data on institutional ownership are from Thomson Reuters 13F and Bloomberg. The accounting standard index (ACCT) is from La Porta et al. (2000). The earnings management score (EMS) is from Leuz et al. (2003). Dispersion (DISP) uses analyst forecast data from IBES. Opaqueness (OPA) is collected from the Opacity Index 2004. The number of news articles (NEWS) is from Griffin et al. (2011). Differenced volatility is constructed by using CRSP and IBES. All definitions and data sources for the variables are given in Appendix C.
In order to control for potential alternative explanations, we run a multivariate regression analysis with additional variables. The average R-squared of a market model (R 2 ) captures the inverse proportion of firm-specific information incorporated in the price. Thus, a lower average R 2 implies more efficient prices (Morck, Yeung, and Yu (2000), Dang, Moshirian, and Zhang (2015). Idiosyncratic volatility that captures the limits to arbitrage should be negatively correlated with the anomaly spread (Watanabe et al. (2013)). In addition, Titman et al. (2013) argue different levels of corporate governance may give rise to cross-country differences in the asset growth anomaly, and Griffin et al. (2011) suggest the regulatory environment is a source of cross-country differences in news reaction. To control for corporate governance and regulation, we consider anti-director rights (ANTDIR), the type of legal system (civil or common law -LAW), the ownership of the largest 3 private shareholders (C3PRI), and the efficiency of the judicial system (EFFJUD). Variable definitions are also given in Appendix C. Table 3 shows the regression results with a nonlinear specification of the newswatcher proxies. The dependent variable is the monthly long-short spread based on mispricing scores (constructed for all anomalies) for each market. We report the pooled OLS regressions with standard errors double-clustered by market and month. 10 Table 3 has the following notable results. First, newswatcher efficiency has a nonlinear relationship with the return of the anomalies. This is especially the case for the equal-weighted analyses. The coefficients of the quadratic term of the newswatcher efficiency measures are negative and significant in 7 out of the 8 equations. This suggests that as newswatcher efficiency increases, the returns of anomalies increase and then decrease, forming an inverted U-shape as predicted.
In order to demonstrate the nonlinear relationship between market development and anomalies, we illustrate the effect of newswatcher efficiency on the predicted return of anomalies using the estimated parameters in Table 3. We calculate the predicted value of the dependent variable (the hedge return of anomalies) by varying the newswatcher efficiency variable (NwE) from its Regression: Long-Short Spreads and Newswatcher Efficiency Table 3 reports regression results. The dependent variable is monthly long-short spreads based on the mispricing scores (constructed for all anomalies) for each market. The independent variables include the quadratic term of newswatcher efficiency and other control variables. Eight proxies of newswatcher are used: investor sophistication (SOPHI), institutional ownership (INSTOWN), accounting standard index (ACCT), earnings management score (EMS), analyst dispersion (DISP), information opacity (OPA), number of news article (NEWS) and differenced volatility (DV). The control variables include a market development dummy (DEV), idiosyncratic volatility (IVOL), R-squared (R 2 ), a common law dummy (LAW), efficiency of the judgment system (EFFJUD), anti-director (ANTDIR), and the ownership of the three largest private shareholders (C3PRI). We report pooled OLS regressions and standard errors are double-clustered by market and month. ***, **, and * indicate significance at the 1%, 5%, and 10% levels, respectively. sample minimum to maximum while holding other variables in the equation at their sample mean level. These graphs are reported in Figure 4. They demonstrate the nonlinear impact of newswatcher efficiency on the hedge return. 11 Furthermore, the emerging markets are mainly situated on the left side of the curves, while the developed markets are mainly on the right. A general marginal effect of a variable can be examined by studying the change of the predicted value when the dependent variable changes its value by one standard deviation away from its median. In our research context, the nonlinear marginal effect that is embedded in the quadratic function can be further demonstrated by examining the deviation from the medians of emerging and developed markets, respectively. For example, a standard deviation of the variable SOPHI is 0.60 in our sample of countries. The medians of the variable SOPHI among emerging and developed markets are 3.74 and 4.7, respectively. In emerging countries, if the SOPHI score is improved by one standard deviation from the median to 4.34 (3.74 þ 0.60), the predicted anomaly return increases by 0.2% from 1.3% to 1.5% on a monthly basis. In contrast, for the same change for developed countries from the median to 5.30, the predicted return decreases by 0.4% from 1.45% to 1.05%. This supports our hypothesis that emerging markets are mainly in phase I (the increasing phase), while developed markets are more likely to be in phase II (the decreasing phase). 12 Second, regarding the control variables, Table 3 shows that countries with a higher R 2 , a common law environment and better efficiency in the judicial system, in general, have lower levels of observed anomalies. This is, in general, consistent with the existing literature (La Porta et al. (2000), Djankov, McLiesh, and Shleifer (2007)). Importantly, these results suggest the newswatcher efficiency explanation is robust to the inclusion of these alternative explanations.
Comparing the value-weighted analyses with the equal-weighted, the explanatory power of newswatcher efficiency is weaker in general. This suggests the cross-sectional explanatory power of newswatcher efficiency is weaker when the effect of small firms is downplayed. Overall, Table 3 confirms there is a nonlinear relationship between the newswatcher efficiency proxies and the returns of anomalies, and this is especially the case for equal-weighted returns. This finding shows newswatcher efficiency helps to explain the observed differences between emerging and developed markets.

V. Further Tests
In this section, we extend the tests of our central prediction across five dimensions by considering: firm size, frontier markets, time-varying newswatcher efficiency, risk models, and an aggregate newswatcher efficiency proxy. We start by 11 The exceptions are the graphs for INSTOWN and OPA in the value weighted return in Panel B of Table 3. This is because their coefficients are insignificant. Therefore, their graphs in Figure 1 should not be counted on as there is no significant relationship between the variables. We only report them here for completeness. 12 When a similar exercise is undertaken for other variables, consistent with our hypothesis, the marginal effect of increases in newswatcher efficiency increases the returns of anomalies more in the emerging than in the developed markets.

FIGURE 4
Predicted Effect of Newswatcher Efficiency Figure 4 reports the effect of newswatcher efficiency on the predicted anomaly returns, using the coefficients reported in Table 3. The dependent variable is monthly long-short spreads based on the mispricing scores (constructed for all anomalies) for each market. The predicted values of the long-short spreads are calculated by varying the newswatcher efficiency variables (NWE) from their sample minimum to maximum while holding other variables in the equation at their sample mean level. These predictions are plotted against the value of the NWE variables. We also indicate the predicted ranges that cover the variations in NWE for emerging and developed markets separately. examining the role of firm size in affecting the cross-country difference. We then study the frontier markets as an out-of-sample test. We further extend our analyses to examine the implication of time-varying newswatcher efficiency in the global markets and in the U.S. market where longer time series are available. We follow this by studying the robustness of our findings after taking into consideration a multi-factor pricing model and by using an aggregate proxy for newswatcher efficiency.

A. Factor Alpha
Asset return anomalies are normally established by passing a series of tests with a traditional asset pricing model, such as CAPM, acting as the control. The search for the drivers of anomalies has led to the incremental development of multifactor models. For example, in the development of Fama and French (2015) and Hou et al. (2015), profitability and investment have been introduced as new pricing factors. In this section, we study the abnormal return from the global versions of the Fama-French 3-and 5-factor models. To estimate alpha, following Jacobs (2016) we run regressions of long-short spreads (the long-short spread is based on mispricing scores constructed for all anomalies) on Fama-French 3-and 5-factor models (global version) for each market. 13 Table 4 reports the summary statistics of alpha from the two risk models. The alphas in Table 8 are smaller in magnitude than the hedge returns in Table 2. The key noticeable difference is that accounting anomalies in emerging markets are insignificant after taking into consideration risk factors when equal-weighted portfolios are considered. This is consistent with our conjecture regarding the role of newswatchers. For accounting anomalies, that are a violation of semi-strong form market efficiency, the need for newswatchers to "dig out" and process the information to create the initial trend is greater than market-based anomalies. The lack of newswatchers in emerging markets, especially in smaller-size firms, leads to overall insignificance for the accounting anomalies in the 3-factor model. Nevertheless, when accounting-based or all anomalies are used to construct the mispricing score,  Table 4 presents the alpha of the Fama-French 3-and 5-factor models. We run regressions of long-short spreads (the longshort spread is based on a mispricing score constructed from all anomalies) on Fama-French 3-and 5-factor models (global version) for each market. And then compute the average of alpha for developed and emerging markets, respectively. Panel A reports Fama-French 3-factor alpha and Panel B reports Fama-French 5-factor alpha; and both equal-weighted and valueweighted returns are applied.

Equal-Weighted
Value-Weighted the anomalies exist in both emerging and developed markets and there is not a statistically significant difference between the two markets. This is consistent with the result in Table 2. We then run a regression analysis that is similar to our main analysis in Table 3. 14 As the results are similar for the two-factor models, we report only the results of the 5-factor alpha in Table 5. The results confirm that newswatcher efficiency measures have a nonlinear relationship with the abnormal returns generated from the anomalies across the countries, especially for equal-weighted portfolios. Noticeably, the limits to arbitrage (IVOL) and legal environment (LAW) variables significantly explain the cross-country difference in the 5-factor alpha. Our results remain in four out of the eight specifications, suggesting that our newswatcher explanation is different from the risk factor and the limits to arbitrage explanation of anomalies.

B. Small Size Effects
Our empirical analysis in Section III reveals a new dimension of the puzzle. It shows the difference between developed and emerging markets is especially pronounced when the anomaly returns are equally weighted as compared to valueweighted. This suggests the well-documented small-size effect on anomalies, where anomalies are found to be concentrated in small stocks, 15 is more pronounced in developed than in emerging markets. This is a further puzzle since existing limits to arbitrage or behavioral bias explanations of this effect would seem to suggest the small size effect should be stronger in emerging markets, as there will be higher limits to arbitrage and more behavioral bias. 16 In addition, we show in Table 3 that the return spread from the equal-weighted return portfolios demonstrates a stronger nonlinear relationship with newswatcher efficiency than do those from the value-weighted return portfolios. This suggests the role of newswatchers might help to explain the reason behind this.
In order to understand the connection between firm size and anomalies, we start by examining the relationship between size and newswatcher efficiency. Firm size can be a proxy for newswatcher efficiency that affects the speed of information diffusion. Hong et al. (2000) argue when investors face fixed costs of information acquisition, they devote more effort to learning about those stocks in which they can take large positions. This suggests information about small firms is transmitted more slowly. If size is used as a proxy measure for newswatcher efficiency, we 14 Due to the challenge of obtaining reliable time-series data for all newswatcher variables, in Table 3, the country-specific variables are time-invariant. Therefore, we choose to use pooled OLS to capture the average effect. In Table 5, we study the effect controlling for the potential time variant of these newswatcher variables where they can be reliably estimated. Given our theoretical prediction is about crosssectional differences for a given time, the use of Fama-MacBeth regressions capture the cross-sectional effect while controlling for potential time-series correlations. 15 Gao et al. (2018) find the distress risk-return relation is stronger in small firms and U.S. small firms present higher spreads than non-U.S. small firms. Lipson, Mortal, and Schill (2011) show small firms exhibit a stronger asset growth anomaly. 16 It is noted the size effect we are discussing is a relative effect within a country's stock market. Companies with the same absolute size in two different countries may not have the same information environment.

TABLE 5
Regression: Alpha and Newswatcher Efficiency Table 5 reports regression results with alpha as the dependent variable. To estimate alpha we run regressions of long-short spreads (the long-short spread is based on a mispricing score constructed from all anomalies) on 5-factor models (global version) for each market. Independent variables include the quadratic term of newswatcher efficiency, and other control variables. See Appendix C for the definition of newswatcher efficiency proxies and control variables. The t-statistics are based on the heteroskedasticity-consistent standard errors of White (1980). ***, **, and * indicate significance at the 1%, 5%, and 10% levels, respectively. predict there will be a nonlinear relationship between size and the number of observed anomalies. This prediction has been supported by Hong et al.'s (2000) empirical study that the very smallest firms have fewer momentum anomalies than merely small firms; that is, their inverted U-shaped relationship between size and anomalies reflects our two-phase prediction. 17 Building on the above we propose the difference between emerging and developed markets is stronger in equal-than in value-weighted returns because market development has an asymmetrical impact on newswatcher efficiency in small versus large cap firms in emerging markets. The lack of market development leads to a lot weaker newswatcher efficiency for small caps in emerging markets than their counterparts in developed markets, while the difference for large caps in the two markets is much smaller. The very low newswatcher efficiency in emerging market small firms inhibits the existence of anomalies and, therefore, makes the difference of small firms between emerging and developed markets very apparent, while such a difference in larger cap stocks is less obvious as newswatcher efficiency is similar in the two markets.
To test this prediction we conduct the analyses of size and all anomalies for emerging and developed markets separatelythis provides extended evidence to the size and momentum studies in the USA by Hong et al. (2000). Figure 5 reports the plot of anomaly hedged returns by size decile for developed and emerging markets. Figure 5 shows a nonlinear relationship between size and anomaly returns in both markets. When comparing the two types of market, the anomaly returns in developed markets are generally higher, especially at the smaller size deciles part of the plot. The t-tests suggest developed markets have significantly larger anomaly returns than emerging markets in the first two small-size groups and this is especially the case for the accounting anomalies. This confirms the puzzle documented earlier and supports our conjecture that market development has a stronger negative impact on smaller-size firms. Furthermore, the inverted U-shape is more profound in accounting than in market-based anomalies. This further confirms the role of newswatchers is more important in processing accounting than market-based information. Interestingly, for accounting anomalies, emerging markets show higher anomaly returns than developed markets in the largest size group. This is consistent with our explanation that when the information environment is similar for the largest companies in developed and emerging markets, the difference in the limits to arbitrage will dominate and produce higher anomaly returns in the emerging markets with the higher limits to arbitrage.
Our finding further complements the limits to arbitrage explanation of the size effect. There are two steps of anomaly formation: mispricing and limits to arbitrage. Limits to arbitrage are linearly linked to size with larger firms having a lower limit to arbitrage (Lam and Wei (2011)). However, this explanation of anomalies cannot alone explain why the smallest size group has fewer anomalies than the next size group in the accounting anomalies. Our explanation of newswatcher efficiency fills this gap. Importantly, the combination of these two Long-Short Returns Across Size Groups Figure 5 plots long-short spreads for developed markets and emerging markets across 5 size groups. Following Hong et al. (2000), we apply an independent sort to rank firms into 5 mispricing score groups based on the mispricing score and 5 size groups based on market value. In each market, we compute average long-short spreads between low and high mispricing score portfolios for each size group every month. Then we calculate the average of long-short spreads for each size group across developed markets and emerging markets, respectively. ***, **, and * near the size group indicate the difference between developed and emerging markets is significant at the 1%, 5%, and 10% levels. Graphs A and B are equal-weighted and value-weighted spreads, respectively.  Cai,Keasey,Li,and Zhang 131 further explains why the difference in small-size firms is much greater than in large-size firms.

C. "Out of Sample Test": Frontier Markets
Our empirical study is motivated and advanced by the general observation of the difference between emerging and developed markets that has been documented by various existing studies (Griffin et al. (2010), Jacobs (2016)). Our main predictions of cross-country variations of anomalies and newswatcher efficiency can be further extended to frontier markets as an "out-of-sample" test, since these markets have been substantially less studied in the asset pricing literature. 18 A simple prediction is the observed anomalies would be even weaker instead of stronger in these least efficient markets. We extend our sample to include another nine frontier markets. 19 We report the comparison of the anomalies among the three types of the market in Table 6.

TABLE 6
Frontier Markets Table 6 reports long-short spreads for frontier markets based on mispricing scores and a comparison to developed and emerging markets. To be included in frontier markets, we require: i) the average number of stocks in each month is no less than 50; ii) the market should have 180 months (15 years). Therefore, 9 frontier markets are included after applying the criteria. Mispricing scores are based on accounting anomalies only, market-based anomalies only, and all anomalies. See the data and sample section for mispricing score construction in detail. In each month, we rank stocks into quintiles based on the mispricing score rank for each market. It reports market average return results. We first compute the time-series average of long-short spreads for each market. Then we average the spreads for all developed markets, all emerging markets, and all frontier markets. Standardized returns are computed as follows: For each market, we compute the time-series average of longshort spreads scaled by their standard deviations. The t-statistics (in brackets) are from two sample t-tests. It also reports the percentage of markets with significant long-short raw returns. We apply two criteria to determine the significance. We first consider a return as significant if its p-value is less than 10%. Second, a spread is considered as significant only if its t-value is greater than 3. The t-stats for the difference are based on the Fisher Exact test. Panels A and B report equal-and valueweighted results, respectively. ***, **, and * indicate significance at the 1%, 5%, and 10% levels, respectively.

Developed
Emerging We thank the referee for this suggestion. 19 With our original sampling criteria (the monthly average of the number of stocks is at least 50 stocks and there are at least 240 months in that market), three of the MSCI frontier markets have been included in our samples and are part of the emerging subgroup. In order to include more frontier markets in our sample, we have had to relax our sample criteria; particularly, the minimum number of months is reduced from 240 to 180 months (15 years). Table 6 provides support to our main results in two ways. First, the anomaly return in frontier markets is generally lower than those of the emerging and developed markets. Consistent with the main finding, this relationship is stronger in the equal-weighted return. Especially, the difference between developed and frontier markets is highly significant and much higher than the difference between developed and emerging markets in our main results. Second, examining the number of significant anomalies, there is strong evidence the strength of anomalies is positively correlated with market development, with the developed markets having the highest percentage of countries with significant return anomalies and the frontier markets having the lowest. Indeed, with a higher hurdle rate of 3 for the t-value, none of the frontier markets has a significant anomaly return. Overall, these findings provide further support to our conjecture that too weak an information environment inhibits the existence of return anomalies.

D. Time-Varying Newswatcher Efficiency
While the main focus of the article is to study the cross-country variations of return anomalies, the theoretical foundation can also be applied to the time series evolution of return anomalies. In this section and the next, we explore the time series dimension. We recognize that newswatcher efficiency in each given market will change over time. Ideally, we would like to capture the effect of this time dimension as well as the cross-sectional difference on return anomalies. Unfortunately, there is limited data available in time series country-level measures. Most of the proxies are observed infrequently or only exist in the last decade and they are not easy to update or backdate with the limited, available data. Among the 8 proxies, we are able to construct a reasonable time series of observations for two of them: analysts' dispersion (DISP) and differenced volatility (DV). Figure 6 plots the evolution of these two newswatcher efficiency measures. There are two observations to take from this figure. First, these newswatcher efficiency measures are consistent with market development, with developed markets having, in general, a higher efficiency (note we plot the inverse of dispersion). Second, there is a general increasing trend in these measures suggesting that, on average, both emerging and developed markets are improving their information environments. For analysts' dispersion, the gap between emerging and developed markets is narrowing. The difference in DV between the two types of markets is relatively constant, except for the period around the dotcom bubble, where the developed markets' earnings become less informative about prices and emerging markets' earnings become more important during this period. 20 We then examine our main findings with these finer measures. Table 7 reports the summary of the Fama-Macbeth regressions with the available time-variant control variables (IVOL and R 2 ). Our main finding of a nonlinear effect is confirmed in Table 7. Three out of the four coefficients for the squared terms are negative and significant. Also consistent with the main result, the findings are stronger in the equal-weighted returns.

E. Evolution of Anomalies: U.S. Evidence
Though the prediction of the theory is drawn from a cross-sectional perspective, the impact of newswatcher efficiency on anomalies has a general implication in a time series context. If a country's information environment evolves over time and investor sophistication improves, an inverted-U-shape pattern of anomaly intensity should be observed over time. In order to observe a meaningful evolution over time a long period is required and we, therefore, turn to the U.S. market with two tests.
First, we study all the anomalies in a 60-year period starting from 1957 to 2016. Table 8 reports the long-short spread for six 10 year subperiods and Figure 7 demonstrates these changes of the spread over time in column charts. Four main results emerge: i) Figure 7 shows that both the return and t-value from the mispricing strategy exhibit an inverted U shape over time (in general, confirming our conjecture). The pattern is especially clear for both accounting and market-based Time-Series Newswatcher Efficiency Figure 6 plots analysts' dispersion and differenced volatility over time. For analysts' dispersion (DISP), we compute the standard deviation of the 1-year earnings forecast divided by the absolute value of the mean forecast, and then scaled by the square root of the number of analysts. We use 1 divided by the dispersion so that the larger the value the greater the newswatcher efficiency. We then compute the average for each market in each year and calculate the average across developed and emerging markets respectively in each year. For differenced volatility (DV), it is the difference between average abnormal earnings announcement event volatility and the average abnormal volatility during 55 days before and 55 days after the event. We calculate the average across developed and emerging markets respectively in each year. The abnormal volatility is the absolute value of excess return of stock return and value-weighted market return. D and E indicate developed and emerging markets, respectively. anomalies in equal-weighted portfolios and for accounting anomalies in valueweighted portfolios, which is consistent with our previous finding in a time series context. ii) the turning points for the two types of anomalies are different. As expected, the weak-form efficiency anomaly returns (market-based) first peak in  Table 7 reports the Fama-MacBeth regression of monthly anomaly returns on squared newswatcher efficiency and newswatcher efficiency with the controls of idiosyncratic volatility and R 2 . Two newswatcher efficiency proxies are constructed, analyst dispersion and differenced volatility. For analyst dispersion (DISP), we compute the standard deviation of the 1-year earnings forecast divided by the absolute value of the mean forecast, and then scaled by the square root of the number of analysts. We use 1 divided by the dispersion so that the larger the value the greater the newswatcher efficiency. We then compute the average for each market in each year. For differenced volatility (DV), it is the difference between average abnormal earnings announcement event volatility and the average abnormal volatility during 55 days before and 55 days after the event. The abnormal volatility is the absolute value of excess return of stock return and value-weighted market return. For idiosyncratic volatility, it is the standard deviation of residuals from regressions of daily stock returns on market returns in each month and we require at least 15 days in that month. We then calculate the average for each market in each year. R 2 is from the regression of weekly stock returns on weekly market returns with two lead and two lagged market returns, thus correcting for nonsynchronous trading. We then compute the average in each year for each market. The time period is from 1994 to 2016 and from 1997 to 2016 for DISP and DV, respectively. ***, **, and * indicate significance at the 1%, 5%, and 10% levels, respectively.  Long-Short Spreads in the U.S. Market Table 8 presents long-short spreads based on mispricing scores in the U.S. market from July 1957 to Dec. 2016. We rank stocks into quintiles based on mispricing scores and compute the average of the long-short spread in each month (the same procedure as in Table 3). Then we report the time-series average of spreads in each 10-year time period. Both equal-weighted and value-weighted spreads are reported in Panels A and B, respectively. ***, **, and * indicate significance at the 1%, 5%, and 10% levels, respectively. the 1970s and then start to gradually decline over time; while the semi-strong form efficiency anomalies (accounting) only start to weaken in very recent years. 21 iii) for value-weighted portfolios, the t-values for the market-based anomalies decrease monotonically over time, while the inverted U-shape is maintained in the accounting anomalies. These patterns echo those identified in Figure 5 when we study the effect of size. It further strengthens our argument that the nonlinear effect of newswatchers on anomalies is more profound in accounting anomalies as newswatchers play a more important role in sowing the seeds for momentum in these anomalies than in the market-based anomalies. Our finding is consistent with the publication effect but cannot be fully explained by it. McLean and Pontiff (2015) study 97 return predictive variables in the U.S. stock market and find that long/short returns shrink significantly postpublication. This is consistent with our argument concerning newswatcher Long-Short Spreads in the U.S. Market Figure 7 reports the long-short spreads and t values based on mispricing scores in the U.S. market from July 1957 to Dec. 2016 that are reported in Table 6. Both equal-weighted and value-weighted spreads are reported in Graphs A and B, respectively.
Graph A. Equal-Weighted 0 2 4 6 8 10 12 0 0.005 1 9 5 7 0 7 -1 9 6 7 0 6 1 9 6 7 0 7 -1 9 7 7 0 6 1 9 7 7 0 7 -1 9 8 7 0 6 1 9 8 7 0 7 -1 9 9 7 0 6 1 9 9 7 0 7 -2 0 0 7 0 6 2 0 0 7 0 7 -2 0 1 6 1 2 Graph B. Value-Weighted 1 9 5 7 0 7 -1 9 6 7 0 6 1 9 6 7 0 7 -1 9 7 7 0 6 1 9 7 7 0 7 -1 9 8 7 0 6 1 9 8 7 0 7 -1 9 9 7 0 6 1 9 9 7 0 7 -2 0 0 7 0 6 2 0 0 7 0 7 -2 0 1 6 1 2 There is an exception in the period between 1987 and 1997 for value weighted accounting anomalies. This is not due to calculation error. We confirm a similar exception in the BM factor return during the same period in Professor French's online data. efficiency. With an awareness of anomalies, investors should react to such information even faster and, therefore, improve the information diffusion and this leads to weaker anomalies. However, the publication effect alone cannot explain the increasing pattern observed above in the times series context. Importantly, in the international anomalies context, Jacobs and Müller (2020) show that the USA is the only market where the "publication effect" is present and, therefore, our overall results should not be driven by such an effect. 22 Green, Hand, and Zheng (2017) observe that anomalies in the USA seem to decline significantly after 2003, which is consistent with our subperiod analysis in Table 8 and Figure 7 where the most recent period's return is lower in general. Importantly, our study offers an explanation for their observations. We find that the time-varying newswatcher variables are indeed increasing over time in the USA; when the means before and after 2003 are compared they are found to be significantly different. 23 In order to connect the observed time series pattern with the time variant of the newswatcher efficiency, we run time series regressions between the anomaly returns in the USA and three of the newswatcher efficiency measures where we have sufficient data for the analyses. Table 9 reports those regression results. The results provide support to the nonlinear relationship between the time variation of newswatcher efficiency and anomaly returns in the USA.

F. An Aggregate Newswatcher Efficiency Proxy
We use eight proxies for capturing newswatcher efficiency. Each of them is imperfect and potentially captures other effects. To confirm our results are not driven by other effects we construct aggregate measures of newswatcher efficiency through factor analysis. Table 10 reports the results of this analysis. Panel A of Table 10 reports the final loading pattern of the two factors identified using six out of the eight proxies. 24 The first factor is dominated by the loading of sophistication (SOPHI), while the accounting standard (ACCT) and differenced volatility (DV) variables also have a modest level of correlation with this factor. All of the loadings are negative and for the ease of interpretation, we multiply this factor by À1 in the regression analysis. The second factor is dominated by a positive correlation with differenced volatility (DV) (an information measure used by Griffin et al. (2010)) and it is also highly correlated with institutional ownership (INSTOWN). Panel B of Table 10 reports the main regression analysis with the aggregated newswatcher proxy. It shows that the quadratic terms are highly significant and negative, providing further support to our main findings. 22 To further demonstrate the evolution of anomalies overtime, we also study one of the long-standing anomaliesthe momentum anomaly from 1926 in our Supplementary Material. We show an overall pattern that is consistent with an inverted U-shape; showing that momentum profits peak in 1956 and steadily decline thereafter. 23 Results can be found in the Supplementary Material. 24 Inclusion of the two analyst related variables when combining all of the data together reduces the sample size significantly. We, therefore, keep only the 6 proxies that produce sufficient nonmissing country observations.

VI. Conclusions
This article contributes to the evolution of asset pricing theory by studying asset return anomalies in a global context with the specific aim of exploring the differences between emerging and developed markets. We confirm the puzzle, via a wide array of anomalies in 45 countries, that return anomalies are equally likely (if not more so) to be observed in developed as in emerging markets. Given the still unresolved nature of the puzzle, we turn to behavioral theories for potential explanations. We also find that a straightforward extension to emerging markets of behavioral explanations that originated within developed markets fails to provide an answer to the puzzle. For example, the literature suggests that investor behavior bias, in combination with limits to arbitrage, induces mispricing in the market. U.S. Time-Series Regression Table 9 reports the time-series regression of anomaly returns on squared newswatcher efficiency and newswatcher efficiency with controls of idiosyncratic volatility and R 2 . Three newswatcher efficiency proxies are constructed, institutional ownership, analyst dispersion, and differenced volatility. For institutional ownership (INSTOWN), we compute the average ownership every quarter given that the number is reported quarterly. And the regression is using quarterly data from 1980 to 2016. For analyst dispersion (DISP), we compute, in each month, the standard deviation of the 1-year earnings forecast divided by the absolute value of the mean forecast, and then scaled by the square root of the number of analysts. We use 1 divided by the dispersion so that the larger the value the greater the newswatcher efficiency. We then compute the average in each month. And, therefore, we use monthly data from 1981 to 2016. For differenced volatility (DV), it is the difference between average abnormal earnings announcement event volatility and the average abnormal volatility during the 55 days before and the 55 days after the event. The abnormal volatility is the absolute value of excess return of stock return and value-weighted market return. We then compute the average in each year. Therefore, we use yearly data from 1972 to 2016. For idiosyncratic volatility, it is the standard deviation of residuals from regressions of daily stock returns on market returns in each month and we require at least 15 days in that month. We then calculate the average depending on the newswatcher efficiency frequency. R 2 is from the regression of weekly stock returns on weekly market returns with two lead and two lagged market returns for correcting nonsynchronous trading. ***, **, and * indicate significance at the 1%, 5%, and 10% levels, respectively. However, if this was the case, then emerging markets should have more anomalies than developed markets, as both the above characteristics are more acute in the former. When digging deeper into behavioral theories, we find that Hong and Stein's (1999) theoretical framework provides a good starting point for examining international variation. Their model focuses on investor heterogeneity that avoids making assumptions concerning specific behavioral biases by a single representative agent. Building on Hong and Stein (1999) we show that newswatcher efficiency in a market has a nonlinear impact on the observations of return anomalies. Newswatchers gradually reveal private fundamental price information to sow the seeds of momentum. Thus, the presence of newswatchers is a necessary condition for observing short-term momentum and long-term reversal in markets. The absence of some anomalies in emerging markets can be attributed to the absence of newswatchers who pay attention to that particular type of news. This is consistent  Table 10 reports the factor analysis on six newswatcher proxies and the factors' explanatory power of the anomalies around the globe. Panel A reports the factor loading patterns. Panel B reports the regression results using long short spread as dependent variables. All the independent variables are defined in Table 3, except the newswatcher measures are factors from the factor analysis. We report pooled OLS regressions and standard errors double-clustered by market and month. ***, **, and * indicate significance at the 1%, 5%, and 10% levels, respectively. with Bhattacharya et al. (2000) who argue that for information-driven (fundamental) anomalies, investors have to be able to monitor and process the relevant data. Griffin et al. (2010) highlight that "… caution must be exercised when comparing efficiency in settings where large informational differences and widely varying speeds of information incorporation exist, such as when making comparisons across markets internationally, as these differences make efficiency comparisons rather complex" (p. 3266). Our study provides an initial mapping of the complex relationship between a market inefficiency measure (anomalies) and market development. The use of newswatcher efficiency bridges the conceptual gap between the information environment (production and processing) and the observation of anomalies. It provides a uniform framework to understand return anomalies and market development.
Our prediction also sheds new light on the role of firm size in return anomalies. We show that the smallest firms often have fewer return anomalies than other small firms, as newswatcher efficiency is at its lowest in these small firms. The very low newswatcher efficiency for small firms in emerging markets helps to explain why there are fewer anomalies in these firms than in small firms in developed markets.
Overall, our study introduces the concept of newswatcher efficiency and demonstrates its importance in understanding cross-country asset return anomalies in the context of market development. We provide theoretical and empirical explanations to the global anomaly puzzle that is documented by Griffin et al. (2010) and Jacobs (2016). We bridge the gaps between market development and the measurement of market efficiency by anomalies. We show that observed anomalies are not sufficient to measure a market's efficiency as extreme inefficiency inhibits the observation of return anomalies. stocks with a valid mispricing score and the proportion with respect to the total number of stocks. In addition, the MSCI classification is used to indicate a developed or emerging market.
Appendix C. Information Environment, Investor Sophistication, and Control Variables SOPHISTICATION (SOPHI): Sophistication is the average of the sophistication score from 2006 to 2014 for each market. The sophistication score is from the World Economic Forum and it ranges from 1 to 7 (7 is the best).

OWNERSHIP_OF_THE_3_LARGEST_PRIVATE_SHAREHOLDERS (C3PRI):
This is the average percentage of common shares owned by the three largest shareholders in the 10 largest nonfinancial, privately owned domestic firms in a market. Data are from La Porta et al. (2000). Higher values mean poorer investor protection.