Yahoo Finance Data API … Gone? Some thoughts
So I came back from vacation to find that Yahoo Finance has changed their API without warning or explanation. The old API is not coming back, apparently. This is what you expect from a service that is free … you get what you pay for. I do not rely heavily on it, but I have a few Google spreadsheets that used to make total return calculations using the “Adj Close” column of the Yahoo data. I think I am now in a spot; here is an example of using different free data sources using Python.
Yahoo Finance
To continue to use python’s pandas-datareader with Yahoo Finance, you have to patch pandas-datareader to 0.4.1. See this discussion. I patched yesterday (5/22/2017) but I expect this patch will be put into PyPi production fairly quickly.
Once you have patched datareader, the data coming back from Yahoo is no longer consistent with what I expect. Check out EEM:
import pandas_datareader as pdr ts = pdr.DataReader('EEM',data_source='yahoo',start='1/1/2000') ts = ts.drop('Volume',axis=1) ts.plot()
In the past, Yahoo’s “Adj Close” column has been dividend-adjusted, but not split-adjusted. Yahoo’s Open/High/Low/Close columns were not adjusted for splits or dividends — they were purely the raw prices. We can see that EEM had two splits, one on 6/9/2005 (3-for-1), and one on 7/24/2008 (3-for-1). Now, it appears that Yahoo’s Open/High/Low prices are adjusted for splits, as is the “Adj Close” price. What gives? Why adjust O/H/L and not Close? Why make them agree with “Adj Close”? Weird.
Google Finance
Compare this to data from Google Finance for EEM. Google’s data IS split-adjusted but not dividend adjusted. See this for background information.
import pandas_datareader as pdr, matplotlib.pyplot as plt gf = pdr.DataReader('EEM',data_source='google',start='1/1/2000') gf = gf.drop('Volume',axis=1) gf.plot() plt.title('EEM from Google Finance')
As you can see, everything is split-adjusted from Google. Now let’s compare Google’s “Close” data with Yahoo’s “Adj Close”. As above, Google’s close price does not include dividend data, but Yahoo’s “Adj Close” price used to but does not include dividends any more:
fig, ax = plt.subplots(2,1) ts['Adj Close'].plot(ax=ax[0]) gf['Close'].plot(ax=ax[0]) (ts['Adj Close']-gf['Close']).plot(ax=ax[1]) ax[0].set_title('EEM - Yahoo and Google data plotted side-by-side')
It appears there is some minor differences in data quality between the two data services, but no dividends included.
What to do?
AlphaVantage.co
I really would like to move off of Yahoo and Google altogether. I looked at AlphaVantage.co – they promise to be “100% free, forever”. The API is straightforward in python. Call using the alpha_vantage 0.20 package from PyPi. (NB: You have to replace the XXX below using your own API key, which is very quick to get).
from alpha_vantage.timeseries import TimesSeries ts = TimesSeries(key='XXXX',output_format='pandas') data, meta_data = ts.get_daily('EEM',outputsize='full') data['Close'].plot()
Despite the slick interface and a lot of data history, a quick and dirty test raised my eyebrows. A simple plot of EEM closing prices shows zeros for several dates:
I’m sure that AlphaVantage.co would fix the data issues if I emailed them. There are quite a few good reviews of AlphaVantage from around the web. However, their data still doesn’t include dividend-adjusted prices, which doesn’t help me out with my need for total return data.
Quandl
Of course there is Quandl, but Quandl is no longer free, and it seems to be as expensive as CSI or Norgate premium data for ETFs like EEM.
Breaking Away
If I want to quickly create total return data, I guess I will need to use pandas_datareader with either Yahoo (using 0.4.1) or Google, and adjust prices for dividends with additional code. I will have to fix my google spreadsheets similarly using Google spreadsheet scripting. I am not looking forward to that!