The Google trends effect on the behavior of the exchange rate Mexican peso-US dollar

We show the advantage of using Google search engine trends to forecast the volatility of the short-term (weekly) exchange rate between the Mexican peso and United States dollar. We perform a comparison of models in the literature that have used Google Trends to examine explanatory variables. Some of the models are based on time series, whereas others are based on the similarity function, which captures the cognitive form of human reasoning. For example, an investor who needs to know the value that a variable will take in the future will take into account relevant, known, and available information, and weigh it to calculate the forecast. We conclude that taking into account the Google Trends variable helps explains partially the behaviour of volatility; and it is necessary to incorporate more aggregation levels. Moreover,  Corresponding author. E-mail address: mario.d.bustamante@gmail.com (M. Durán Bustamante) Peer Review under the responsibility of Universidad Nacional Autónoma de México. http://dx.doi.org/10.22201/fca.24488410e.2018.1710 01861042/©2019 Universidad Nacional Autónoma de México, Facultad de Contaduría y Administración. This is an open access article under the CC BY-NC-SA (https://creativecommons.org/licenses/by-nc-sa/4.0/) ARTICLE IN PRESS To cite this document:: Durán Bustamante M.; Hernández Del Valle A.; Ortiz Ramírez A. (2019) The Google trends effect on the behavior of the exchange rate Mexican peso US dollar. Contaduría y Administración. 64(2), e103. http://dx.doi.org/10.22201/fca.24488410e.2018.1710 2 to the best of our knowledge, literature on the subject of using Google Trends to explain relevant economic variables is relatively scarce. JEL code: E71, C22, F31.


Introduction
Predicting the volatility of the exchange rate in the short term is undoubtedly a challenge for heavily traded emerging markets, which are subject to natural speculation against less volatile currencies in the short term. The Mexican peso (MXN) is the tenth most traded currency in the world. The daily turnover of the peso oscillates around USD 112 000 million and 97% of these transactions are made against the US dollar. The latter currency had a total trading volume of all currencies of 87.6% in 2016. 1 In the Mexican current account in the first quarter of 2016 imports reached USD 89 133 million and exports were USD 85 148 million. 2 The amount of Mexican pesos that moves over two days in the foreign exchange market is much greater than the exports and imports combined in a single quarter; this is a key factor in the exchange rate being subject to speculation within the market currency.
Moreover, purchasing power parity (PPP) considers the exchange rate as a price; (Taylor & Taylor, 2004) demonstrate that PPP shows deviations from the expected value in the short term. The price or exchange ratio can be considered a demand variable, which is influenced by the external market of goods, the financial market (derivatives), and its own market: the exchange rate (the variable itself generates its own dynamics) and the speculation of this same market. It is necessary to take into consideration that in the short term not all variables play an important role in the supply-demand relationship for foreign exchange.
The financial system has become very complex and this has led to different social science fields, such as psychology and neuroscience, wondering how financial decisions are generated. They consider that financial decisions happen at different levels of aggregation as opposed to economic theories. (Frydman & Camerer, 2016) mentions the example of the efficient market Resumen En este trabajo se muestra la ventaja de usar tendencias del motor de búsqueda de Google para pronosticar la volatilidad de corto plazo (semanal) del tipo de cambio peso mexicano -dólar estadounidense. Se realizó un comparativo de modelos que en la literatura ya han incorporado las tendencias de Google como variable explicativa. Algunos de estos modelos utilizan series de tiempo, mientras que otros utilizan la función de similitud, que captura la forma cognitiva del razonamiento humano. Por ejemplo, un inversor que desea conocer el valor que tomará una variable en el futuro tomará en consideración información relevante, conocida y disponible, y la ponderará para calcular su pronóstico. Se concluye que el tomar en consideración la variable de tendencia de Google no es suficiente para explicar el comportamiento volátil semanal del tipo de cambio y que es necesario incorporar más niveles de agregación. Además, de que de acuerdo con nuestro saber existe escasa literatura que utilice las tendencias de Google para explicar variables económicas relevantes. hypothesis, which only takes into account data from market levels, leaving aside information from individuals. 3 (Frydman & Camerer, 2016) find four levels of aggregation: (i) household, (ii) individual trading patterns, (iii) decisions that determine the price of assets in the market, and (iv) corporate investment funds. Data from search engines may belong to more than one level of economic aggregation, although it is difficult to distinguish to what level they belong; nonetheless, part of the results obtained herein provide us with an idea, as mentioned below. Google Trends should belong to (i) or (ii) above, because (iii) and (iv) may well use other specialized sources of information, such as Bloomberg.
Empirical studies have proposed a number of simple models to explain economic variables using Google Trends. For example, (Choi & Varian, 2012) use autoregressive (AR) models to forecast auto parts sales, tourism, and unemployment. They find that Google Trends are often correlated with economic indicators and are useful for improving short-term prediction as they attain the most successful outcomes at breakpoints or inflection points. In addition, (Carrière-Swallow & Labbé, 2013) find that the AR(1) model has a very good fit and produces more successful results outside the sample than considering other higher order specifications. They also analyze consumer behavior, i.e., interest in buying a car, by building an indicator that involves Google Trends.
Other models are based on the similarity function, which most closely resembles the natural form of human reasoning. According to (Lieberman, 2010), this function can be used to predict the variable in question, taking into account similar information and the same historical variable, as proposed by (Gilboa, Lieberman, & Schmeidler, 2011). One example is an economic analyst who needs to forecast next year's inflation. Some current applications of the similarity function have been to explain the volatility of the stock exchange (Golosnoy, Hamid, & Okhrin, 2014), and of even greater relevance to this work, it has been used to explain volatility using Google's search engine trends (Hamid & Heiden, 2015). The latter authors find that in short periods of high market volatility, prediction improves when investors' attention to the market is taken into account if Google Trends are used.
The Google search engine is the one most used worldwide, embracing 67.78% of the desktop market, and on mobile devices and tablets 94.4% of the market see (Market share, 2017).
The objective of this paper is to develop different methodologies to examine the MXN-USD exchange rate variable. On the one hand, there are models based on AR, using time series that directly incorporate the trend; on the other hand, there are models based on the similarity, which performs a transformation, as shown in section Empirical volatility models.

Theory and literature review
The use of big data indicators in economics is growing. (Guzmán, 2011) uses Google Trends to forecast inflation. (McLaren & Shanbhogue, 2011) propose that the data generated by a web search can be used by central banks in the management of economic nowcasting. (Carrière-Swallow & Labbé, 2013) use this means for auto sales nowcasting in emerging markets (Smith, 2012) investigates if the activity of the search engines is related to the volatility of the foreign exchange market. Specifically, he takes into consideration the search terms: economic crisis, financial crisis, inflation, and recession. With these terms in Google Trends, he constructs two estimators, one over the short term and one over the long term, for which he considers a fourweek moving average. He employs these short-term and long-term estimators in three different ordinary least squares models, incorporating volatility for seven currencies. However, the R 2 statistics are poor considering the use of three variables to explain volatility in all cases and one intercept. It is thought that this may be due to the fact that the search terms chosen are somewhat technical and it is not known whether there is an empirical correlation of these terms with the volatility variable to be explained. Moreover, there is a question concerning whether these particular Google Trends capture the real behavior of investors. Nonetheless, the author concludes that these keywords can predict volatility in the foreign exchange market, albeit they only capture part of the volatility of the entire market of financial decisions that spin around the currency market. (Hamid & Heiden, 2015) forecast volatility by incorporating Google Trends to forecast the variance of the Dow Jones index based on five-minute returns, focusing on weekly forecasts.
On the other hand, GARCH family models are widely used to analyze volatility. (Kumari & Mahakud, 2015) built a sentiment index to analyze the return on assets of the Indian market. They incorporated the index in the mean and variance of the GARCH model. They find that their sentiment index helps to anticipate negative and positive return volatility. (Afkhami, Cormack, & Ghoddusi, 2017) looked for terms in the Google engine that best represented market investors in order to forecast energy price volatility. They used ninety search terms related to the energy market and they contrasted them with the usual GARCH. (Morimoto & Kawasaki, 2017) modelled market volatility taking into account on line intraday news using HAR models, big data techniques and text mining techniques.
Other authors suggest that investor attention is a driver of volatility over short periods of time (Vlastakis & Markellos, 2012), and that there is a close relationship in phases of high volatility (Andrei & Hasler, 2015).
In this paper, we use Google Trends data as volatility variables in relation to the exchange rate because people's interest when looking for relevant information concerning a financial economic variable can be translated as a precursor or an indicator of whether there is volatility in that variable and considering this as one direct variable in the model would imply a lower final impact of this variable. (Alvarez, Atkeson, & Kehoe, 2007) consider that in the short run, the exchange rate is not affected in the first statistical moment; they propose a theoretical model in which they analyze the effects of the second statistical moment on the currency.

Empirical volatility models
According to (Gilboa, Lieberman, & Schmeidler, 2006), there are sometimes issues to be solved that are unique, such as calculating the sale price of a work of art, knowing if a person's body will react successfully to a certain operation, or an analyst determining the rate of inflation for the following year. Thus, sometimes the information is not sufficiently relevant to explain a particular case and it is sometimes necessary to weigh the information, counting it and assigning it a weight according to its importance for the case under study. This seems a more accurate approach than simply taking all available information directly and expecting it to suffice to answer the issue that the analyst wishes to address; in this context considering "similar" variables adopted in other studies can improve the prediction process. Economic agents prefer to obtain good results based on similar cases that have previously been well adjusted.
Mathematical models that incorporate cognitive processes, observed behaviors, and novel data sources generated by individuals or the economy are considered models of economic behavior. These types of predictive models are formulated using the similarity function see (Gilboa et al., 2011). (Lieberman, 2010) argues that the similarity function can be considered a natural model of human reasoning and its statistical validity has been shown through the axioms of (Gilboa et al., 2006).
We define the process of exhibiting interest y t+1 as follows: with (Golosnoy et al., 2014) interpret x i,t-1 as previous forecasts similar to y t . The variable x i,t-1 must naturally be similar to y t in the past. For this reason, the distance between the two must be small, so as to give: The forecast is given by: The similarity function is represented by a Euclidean distance , the main function of which is to measure the distance between the study problem and similar situations with the objective of transmitting these weighted measurements to the similarity function, and this in turn to the model of "empirical similarity." The concept of "similarity" can be extended to analyze volatilities. This would imply that y i must be a known volatility variable to be forecast, and the similarity function must be constructed with variables that can anticipate future volatility behavior based on past behavior; in line with this, the economic agent will take the information and consider it in accordance with the function of similarity to construct the estimate of the volatility that will arise in the future.
The similarity function is specified as: where G i,t-1 is the trend behavior variable generated by the individuals.

Temporal models
The random walk can be represented by an AR(1) process. (Charles R. Nelson & Charles R. Plosser, 1982) analyze different macroeconomic series, concluding that the best prognosis for these is very close to the random walk. In different works, it has been established that for the main currencies a random walk can be a good approximation of the nominal exchange rate.
The simpler model proposed by (Choi & Varian, 2012), based on time series, follows the following dynamic:

With
. Where y_t is the process to be explained, g t the variable corresponding to Google trends, and b 12 y t-12 the seasonal component, if this exists.

Data
Google trends data Google Trends data are collected through the company's search engine; this collates relevant searches for any topic browsed on one's page and that can be downloaded from the (Google, 2017b) web page. The data are standardized by the keywords that generate the highest number of searches on a specific date. This value represents 100% and the other values for data must be lower than this amount.
The Google Trends website allows one to view and download up to five joint variables, but one must be very careful because when they are downloaded together, they are standardized based on the most searched variable. If the query is downloaded in isolation, it is standardized based on the most searched data. It is possible to select the data by region of the world and choose the temporal frequency. It is also possible to select where one looks for the topic of interest: in the web browser, in the image finder, or in the news. Moreover, it also allows one to visualize geographically the countries represented in the search area, shading the most important regions in navy blue and the less important in light blue in terms of the total number of searches.
We use historical data for the trend of the "precio dolar," counted weekly beginning in the week 04/01/2004 to 10/01/2004 and ending in the week 03/20/2016 to 03/26/2016 (639 observations). The observations begin on Sunday and end on Saturday. The process of choosing the "precio dolar" variable is as described in Annex A. In terms of the selection of trend variables, the geographical area was not delimited, the intention being to visualize which countries are interested in acquiring USD and what might start speculation in demand for USD, i.e., what would motivate more foreign investors to purchase this reserve currency, or simply to reflect the behavior of an entire area that might acquire USD, directly affecting the MXN-USD exchange rate.

Exchange rate data
The historical data for the exchange rate were downloaded from the Banco de México (BANXICO, 2017) portal; the information is daily from January 1, 2001 to March 29, 2016. The data for Fridays are taken as the closing values of the exchange rate price in each week, so that the data are weekly and are homogeneous with the weekly trend data.

Realized volatility of the exchange rate
Realized volatility refers to volatility that occurred in the past and usually concerns derivatives. For example, if we wish to examine monthly volatility, it can be calculated by taking the standard deviations of the daily returns in the desired month according to the (Nasdaq, 2017) website. For the second time (the first being in the selection of trend variables), we use Granger's causality test on the realized volatility and the Google Trends variable, finding that in its first and sixth lags in the test, the Google Trends variable Granger causes realized volatility with a significance level of 5%.
It should be noted that the data, even for the past, are updated according to the maximum value recorded in the specified period of time. For example, in a future period when a higher query volume for a search term is registered than in the previous data, this maximum modifies all the series, adjusting them with the new value. In considering a specific range in the past that does not consider the present period, clearly in that time range the maximum search value will be different. 4 Thus, when considering a fixed data series, it is incorrect to consider dividing the series within the sample. It would only be valid if the maximum value of the whole series were considered in the same range within the sample, or another series were downloaded for the selected period; this might change the specification of the model that takes the present into consideration. We consider that the best option is to perform contrasts outside and not within the sample.

Methodology and results
According to (Choi & Varian, 2012), trend variables can be incorporated directly into the model. Applying the same idea to explain realized volatility, we have the following: In this specification, all coefficients are found to be significant and the standard error has a level consistent with the coefficients see Table 1.  On the other hand, the empirical similarity (ES) method is not interested in directly analyzing the variables over time. The methodology tries to weigh similar cases to improve the forecast. This feature is an advantage when the historical information does not exhibit regular and consistent periodicity in the entire sample.
Here, we adopt a variation of the ES methodology, as described by (Hamid & Heiden, 2015): RV t and RW t =exp(-ω 1 (RV t -G t )) RV t are weakly stationary in booth models. Table 2 reports our results of unit root tests (Dickey & Fuller, 1979, 1981 and (Phillips & Perron, 1988 Table 3 Parameters of the empirical volatility model.

Source: Own elaboration.
Adj R-squared: 0.3421, AIC: -10.4203 Table 3 shows the values of the parameters obtained for the ES model; the Akaike information criterion (AIC) is quite similar in both models. Interestingly, the significance level of the second model is preferable based on the values of the t-statistic. Figure 2 shows the out-of-sample auto one-step-ahead predictions with fixed coefficients. The period visualized is from April 4, 2016 to October 29, 2016; as we can see, both methodologies fail to capture the full magnitude of the volatility, leading us to believe that the financial decisions that are reflected only represent a part of the total behavioral decisions within the foreign exchange market.
The ES approach follows the real volatility more closely, but it does not capture the full breadth of the volatility or its shape.
In Figure 3, the behavior shown by investors before the historical maximum of the MXN-USD exchange rate shows no considerable variation. If we undertake a visualization using a window of more years, it is clear that there is no change in tendency in the series over the period. In Table 4, we can confront this with the real values of the exchange rate. Over time, interest only increases highly from September 19, 2016 to September 20, 2016 (see Figure  3) and from September 21, 2016 to September 26, 2016 the interest rate is down. In this last period, higher values of the MXN-USD exchange rate are achieved.

Conclusions
The way in which the volatility of the MXN-USD rate is modeled in this work is quite novel because it incorporates the behavior of individuals in different regions of the world whose financial decisions are focused on acquiring USD, possibly to avoid a devaluation in capital, or simply to try to generate profits by anticipating the depreciation of an emerging country's currency. All this directly affects the Mexican peso in the short term. The similarity model represents a clear way of capturing such behavior. Even so, a lot of work remains to be done; in particular, it is necessary to include more relevant variables for short-term impact.
For example, in the results obtained, even in the most volatile periods it is not possible to capture the behavior of all the investors. Examining different variables in Google Trends, they do not reflect drastic upward movements in the month of September 2016 (see Fig. 3), which makes us think that there are different levels of economic aggregation at which different financial decisions are made. Moreover, the Google Trends variable only represents a part of the financial decisions that are made in the currency market.
An extension of this study would be to determine what sources of information allow us to anticipate the behavior of big capital, which surely comes from large corporate investment funds and cannot be captured by the Google Trends volatility proxy chosen here.

Selection of trend variables
In the selection of the main variable, we have taken into account that the geographical area reflecting the search term should not repeat countries. We consider whether the region can represent the behavior of an entire economic zone that is able to acquire large amounts of the currency, or anticipate a previous demand for USD as a refuge in the face of the possibility of a devaluation of the exchange rate, motivating the investors in other parts of the world to generate profits on the currencies of different emerging markets that are affected by the demand for USD.
We have also ensured that the correlations calculated by the Google Correlate tool see (Google, 2017a) are consistent with keywords related to the exchange rate and not with other Internet search terms.
Graphically, we have corroborated that the behavioral patterns of Internet searches hold a similar trend to the historical data for the currency in question.
The following table shows the searches made for different terms related to the MXN-USD exchange rate.  Taking into account that the keyword must maintain a similar trend with the exchange rate (i.e., one can introduce the keywords in the portal and see a similar trend), Google Correlate was consistent in terms of keywords related to the exchange rate. The geographical area can be interested in acquiring USD or it can start speculation in demand for USD. The 30 search terms are reduced to only three: "precio dolar," "USD exchange rate," and "today dollar." From the calculation of the correlation between these variables, which is very high (almost close to 1), we obtain the values of the R 2 statistic for the MXN-USD exchange rate vs. the three Google trends, as shown in Table A2. Based on Table A2, the concern is to identify which of the three variables should be chosen as the proxy variable; adopting the three variables would not be appropriate because of the high correlation between them. In addition, simply considering the R 2 statistic as a selection method would not be appropriate.
We have used the Granger causality test to determine whether one variable could be a causal precursor of another, and to consider the variable showing the largest source of previous speculation in the analysis. The results show bidirectional causation for the USD and the today dollar rate with a level of significance of 1%. In terms of the second trend variable, the USD exchange rate causes the USD price with a level of significance of 1%, and in the opposite direction, the USD price causes the USD exchange rate with a level of significance of 10%.
Moreover, when the number of lags increases, we have another consistent result, namely that the USD exchange rate causes the today dollar rate at different levels of lags with a level of significance of 1% for a lag one to a lag ten which suggests that the relationship between the two variables is very strong even with many lags; thus, we discard the today dollar variable.