» erweiterte Suche » Sitemap

Alternative Investments

Jan Becker

Big Data Investments: Effects of Internet Search Queries on German Stocks

ISBN: 978-3-95934-597-2

Die Lieferung erfolgt nach 5 bis 8 Werktagen.

EUR 29,50Kostenloser Versand innerhalb Deutschlands

» Bild vergrößern
» weitere Bücher zum Thema

» Buch empfehlen
» Buch bewerten
Produktart: Buch
Verlag: Diplomica Verlag
Erscheinungsdatum: 08.2015
AuflagenNr.: 1
Seiten: 92
Abb.: 43
Sprache: Englisch
Einband: Paperback


In recent years, the internet has developed very quickly and became a major source of information all over the planet. Many scientists have used search engine query data to forecast econometric time series like consumer confidence indicators, unemployment rates, retail sales, house price indices, stock prices, volatility of stocks and even commodity prices. Following the prior research this study analyzes the impact of internet search engine data on capital markets. Many authors already have contributed to index level data and most of them on the US market. This study adds to the existing literature on the German stock market. Two research questions are answered: First, whether an increase in search queries drives individual stock returns and second, whether queries affect the implied volatility of stock options. After controlling for seasonality, autocorrelation and general market risk, in the further analysis also the Price-to-Book valuation, one year performance and historical volatility are examined in interaction with internet search queries.


Textprobe: Chapter 1.7, Data Scope of Analysis: The data scope of this study extends over the German Stock Index DAX® (Deutscher Aktien IndeX), MDAX® (Mid-Cap-DAX) and SDAX® (Small-Cap-DAX). The three parts are the German prime standard market indices for large, medium and small sized exchange listed companies. The blue chip index DAX covers 80% of Germany’s free float market and consist of the 30 largest companies in terms of market capitalization and exchange turnover. MDAX and SDAX both have 50 titles and follow directly DAX constituents. The full prime standard would be completed by adding TECDAX® to the sample. The TECDAX consists of the 30 largest technology shares. The composition of all indices is constantly review and rebalanced on a quarterly basis, except for new listings, deletions or mergers, which are taken into account immediately (Deutsche Börse 2013, p. 19). The TECDAX was initially not included into the sample for two reasons. First, in order keep the sample size manageable and secondly with respect to the online business model of some companies (e.g. Xing or Freenet) the correlations of search queries and the success of the companies were assumed to be high ex ante. So for a generalization of theory the hypothesis should work for standard companies too. 1.7.1, Timeframe: The overall timeframe of nine years and four month from 10 January 2004 to 4 May 2013 refers to the first publicly available observation downloadable from Google and the time of this study. All regressions are based on this time frame. It is to say, that two major macroeconomic crisis fall into this period. The global financial crisis of 2008/09 and the European sovereign debt crisis that has been going on since 2010. Both crisis affected the global economy and lead to a slowdown of production. The financial crisis of 2008 is sometimes also referred to as the Great Recession”. 1.7.2, Necessary Adjustments in Sample Selection: Over this period not all the stocks could be added to the analysis, which adds a small selection bias. The final structure of constituents as of 6 May 2013 is modified with respect to the initial setup of January 2004 in the following way: All stocks which are included in the index in 2013 should also be in one of the three indices at the starting point of the analysis in 2004 in order to ensure that all control variables are available and the stocks are already exchange listed and tradable. This implies a survivorship bias in terms of excluding companies which defaulted or merged during the time in between. Some companies were taken private and are also excluded, because no trading prices are quoted anymore. Third, newly listed companies after 2004 are not included in the sample. There have been several event studies concerning IPO’s which cover this topic (cf. Da, Engelberg and Gao, 2011). The main argument for adjusting the sample is to focus on a continuous and comparable data set basis. 1.8, Search Engines – Gateways to Information: The internet search query data is downloaded from Google. According to the American company Google Inc. had a German market share of 80,4% (Webhits, 2013) and a rather higher score of 83,18% was reported by on a Global ranking (Netmarketshare, 2013). This leads to the assumption that Google data can, to a certain extent, allow to appropriately test hypothesis and has the necessary data scope to draw statistically significant conclusions about overall search activities. 1.8.1, The Google Tool: Historically, there has been” Google Trends” and Google Insights for Search” which both have been merged into Google Trends” in September 2012 (Google, 2012b). Since then the combined interface, under Google Trends, is the only remaining platform. The service is provided by Google Inc. ( Google”), located at 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States, and can be accessed via Access to the data is free of charge and furthermore all times series can be downloaded after registration and login with a free Google Account at the website. The tool allows to lookup an index of a specific search query from year 2004 until today and is available for worldwide data. The user interface of the software offers four options to specify the query into web search, area specific search, time frame and category. For this study Web Search” is of relevance, which could also be modified into Product” or News Search”. Not relevant are Youtube” and Image Search”. The area specific search queries can be modified to a specific country, state or in some case also cities. For example Germany can be broken down into the state of Hessen”, but the city of Frankfurt is not available as time series yet, although Google already displays the current respective search activities by city. Depending on the query’s frequency some time series are still on a monthly basis, whereas the most common downloadable format is of a weekly frequency. The existing possibility to download daily time series of the last 90 days can be extended to the further past by manually downloading one month windows by the Select dates” functionality and then chaining the time series parts together manually. Under the category ‘filter’ there are 26 options with 241 subcategories available. Using the example of Choi and Varian (cf. 2009a, p. 4): …query [car tire] would be assigned to category Vehicle Tires which is a subcategory of Auto Parts which is a subcategory of Automotive”. Of major interest are the categories Business & Industrial” and Finance” for stocks. These categories do not always deliver time series for all queries. So in the later analysis the most general form over all categories is applied, instead of the Finance” filtering, to take care of the maximum likelihood to actually get a time series to analyze. Other research focused on these specific categories (e.g. Fink and Johann, 2013). The query can be compared to its category. In this case the time series is scaled into a percentage of the initial starting value and is thus a growth rate (Google, 2012c). The category can add important information with respect to seasonality. 1.8.2, Grouping: It is possible to group up to 25 search terms via a +” sign. The Items are then displayed as separate graphs. In order to specify the query for combinations of terms the quotation marks ( x+y”) have to be set at the beginning and end of each request. 1.8.3, Multiple Counting is automatically avoided by Google: In order to avoid multiple counting the request are filtered by their IP address. The IP address (Internet Protocol address) is a numerical label assigned to each computer, which uses the Internet Protocol for communication. The IP can be used for host or network interface identification and location addressing. By identifying each user via an IP only the sum of their daily queries become part of the search volume index. If one user is not only searching via one computer (IP) then the queries are counted multiple times. It cannot be distinguished on a publicly available basis, for how many cross sectional queries the same user is responsible. It is possible that one user is responsible for generating all the signals over time. 1.8.4, Synthetic Index rather than actual Numbers: Google does not publish the overall sum of search queries, but calculates an index. This index is bounded within the values of 0 and 100 and is recalculated under specific situations: Whenever there is a new maximum of search queries, this quantity is set to 100 and thereafter preceding quantities are scaled by this quantity via division and multiplication of 100 until a new high is reached. The old values are not recalculated and remain scaled by their old maxima. The Index could be interpreted as a percentage index (Google, 2012a). For this reason it is difficult to compare different stocks by their search intensity. The actual quantities are not available to attribute increases in one stock query to a decrease of another. This may be interesting in the case of actual sales and shipped units of competitors. Moreover, the rescaling does not allow drawing conclusions about the original query quantity for a company, because the basis is constantly shifting. 1.8.5, Empty Values: Another drawback in Google’s practice is to publish an index value of 0” instead of a very small number, whenever the search queries were below a certain threshold level. Google does not transparently explain how the threshold level is measured until now (Google, 2010). 1.8.6, Limited by German Language: As the later data will show, most of the relevant data emerges from the German language area and from queries within Germany. This is also true for most queries which refer to international exporting companies like car manufacturers (e.g. BMW) and is in contrast to the previous studies on individual stocks from the US market. On the higher level of DAX there are comparably more international queries than in the smaller company index MDAX and SDAX. This may be a hint to home bias and the local degree of familiarity with smaller stocks. When analyzing queries of a combination of the stocks name with a second word, the language barriers become more obvious. When searching for terms like Aktie” (engl. stock) or Dividende” (engl. dividend) already small changes in the denotation can tilt the data origination form German to English speaking countries. A study by Mondria and Wu (2011) showed that home bias delivers higher returns by advantages of higher information density. Therefore the study uses the German terms in the regression models. A comparable study by Bank, Larch and Peter (2010) on the German stock market for all Xetra-listed stocks used the Name of the companies, but without any AG”. Their queries are restricted to only German queries. Fink and Johann (2013) apply the category filter Finance” when downloading the data in addition to the name of the German companies. This procedure allows taking advantage of a particular Google feature, which assigns queries with the classification of the final website accessed, after activating the query. This anomaly to other studies adds additional information to the query and the authors show that it improves the query quality. As it is not transparent how Google classifies Finance” queries, in this study nevertheless the standard query method is used and the focus is set via the additional terms AG” and Aktie”. 1.8.7, Exact Wording of Search Terms and Search Term Combinations: When searching for data on a search engine one question which arises is: What do people type into the search engine? Most users start by typing in just one search term (cf. Spink et al., 2001). This seems to be common practice and is also supported by the data set later. To set up a list of words, the most common reference name for a company is searched (e.g. BMW for Bayerische Motoren Werke Aktiengesellschaft). This approach had some minor flaws because the German common understanding of some stock names conflict with some equal meaning in the English language. E.g. MAN” is a German producer in the automobile industry and Metro” a big retailer for consumer products. In these cases a more stock related perspective was introduced by searching for the combination of the stock’s name together with the German abbreviation for PLC (public limited company) namely AG” (Aktiengesellschaft). Altogether, four main search combinations evolved: The common search name declared as Name”, the name plus AG”, the name plus Aktie” and the name plus News”. It is to say that the available data frequency dramatically decreases by combining terms. In the initial setup many more terms were included, but not enough data sets could be extracted. These terms were: name plus Report”, name plus Return”, name plus Rendite”, name plus HV” (engl. shareholders' meeting), name plus IR”, name plus Investor Relalations”, name plus Bilanz” (engl. balance sheet), name plus P&L” and name plus GuV” (engl. P&L). The fact that only top level search terms are available may support the initial assumption that only search queries with one word are preferred over full sentences or it may be due to Google’s policy not to publish time series which fall below a certain threshold level.

Über den Autor

Jan Becker was born in Hessen in the mid-west of Germany in 1986. After acquiring his Bachelor of Science in Economics and Business Administration from Goethe University in Frankfurt, he graduated as a Master of Science in Capital Markets from Frankfurt School of Finance & Management. The author is a capital market professional in the field of asset management and quantitative finance. He gained relevant experience in practical tactical asset allocation at a renowned German asset management firm and thereafter on capital market derivative strategies for a consultancy firm in Frankfurt. His core area of expertise is centered on multi asset and absolute return investments. Today he is working for an international asset management company and supplying institutional clients with investment solutions.

weitere Bücher zum Thema

Bewerten und kommentieren

Bitte füllen Sie alle mit * gekennzeichenten Felder aus.