Google Trends x Yahoo Finance Analysis in Python
Python has a multitude of libraries dedicated to scraping the internet in various ways. For example, Google Trends is a product produced by Google that analyzes search history and publishes the popularity of search terms over time. One user created an algorithm to pull trend data from Google using Python in a package called pytrends. Another such library uses Python to pull stock information from Yahoo Stocks in a package called yfinance. Both of these libraries will be used to plot and compare finance and trend data over time using Python scripts. The methods outlined in this tutorial could be applied to areas in finance, data analytics, and data visualization in general.
As stated above, the pytrends and yfinance packages will both be used to compare and analyze stock and trend data via Yahoo Finance and Google Trends, respectively. Therefore, we must install the required packages. And luckily, both are available for download using pip:
If the packages were installed correctly, we can head to Python 3.x and attempt to import the packages in order to verify their functionality and that everything installed correctly. Below I have included a test script that should test the import and basic function of both packages:
The example above should print out the following, with adjusted times based on the run date:
The code above grabs the available Google Trends data from the last 30 days (likely, 2 months of trend data), and then grabs 30 days worth of finance data from Yahoo Finance, and prints out the last value from each database. In the case above, ‘INTC’ is used as the input to the Yahoo Finance database, and we then tell Google Trends to return the trend data for ‘Intel Corporation.’ The resulting data is the stock closing price on the 17th of January, 2020; a value of $59.60. The resulting trend data is 0.6, indicating a medium-level popularity in search data with Google for the Intel Corporation (the value printed is the scaled data, ranging from 0-1, where 1 indicates a highly popular period, and 0 indicates a highly unpopular time period).
Now that the stock and trend packages are downloaded and have been tested, we can begin to look at how the two behave over time by plotting them together. The code below grabs data from the AAPL (Apple) stock and compares it with the “Apple, Inc.” Google Trends search for the months of September - December 2019:
The resulting plot is shown below:
We can see just how the trends and stock prices are correlated and how specific events can be seen on both the trends and stock price changes. For example, on September 9, 2019 Apple was trending at its peak over the three month period, and this can be correlated to the announcement of the iPhone 11. Two other peaks over the three month period can be attributed to Black Friday sales and Christmas sales. The subtler changes in stock price can be correlated to performance of the Apple brand.
One statistical measure for comparing two different data is called correlation. Correlation involves the covariance between two variables and their two standard deviations. The correlation coefficient can be written as:
where rx,y is the correlation coefficient, x is the first variable, y is the second variable, k is the index of each sample, and N is the total number of samples in a given set of data. An overline, in the case of x, defines the mean of the given variable x.
In the code and plot below, I introduce the correlation between trend and stock opening price for AAPL (Apple Inc.) for all of 2019:
We can see that the correlation function follows a lot of the momentum of either positive or negative trends in stock price. We can see that for roughly 3/4 of the year there is a negative correlation between stock price and trend. This indicates that the absence of search related to Apple was correlated to an increase in stock price. And the last quarter of 2019 indicates that there was a positive correlation between increase in stock price and search popularity; meaning that there was a direct correlation between search and increase in stock price. This change in the last quarter can be attributed to some of the sales related to the iPhone 11 as well as other products.
This short introduction to Google Trends and Yahoo Finance in Python is meant as a starting point for those interested in analyzing data related to stocks and peripheral company information. Other sources of information related to company stocks can also be used to correlate stock price, such as news, weather, and even company earnings and expenditures. Python is a powerful tool that can be used to mine and analyze publicly available data. Libraries and module help maintain Python as one of the most important programming languages of the 21st century. Traditional statistical tools are also helpful for understanding relationships between stocks and peripheral data - and in this tutorial I merely covered one: correlation. The time series correlation can give us some insight into how search trends relate to company stock price, and can perhaps be used as a tool for predicting the behavior of company stock prices.
See More in Python and Data Analysis: