Yahoo Finance Data … Further Thoughts

A few weeks ago, I wrote up a post on the problems with the Yahoo Finance data API, and some possible alternatives.  Here is a follow up to that post with my latest findings.
 
 

Updates to Last Post

(as of 6/8/2017)

  • Yahoo Finance
    • Accessed from Python using pandas-datareader package version 0.4.1 (see last post)
    • PRO…the data still seems to be there
    • PRO…it appears that they (Yahoo Finance software engineers) are trying to fix the data problems they created when they went to the recently introduced API
    • CON…each new day brings another WTF moment as I try to adjust to understand the changes that the Yahoo team is making
    • CON…the data used to extend back to 1950 for ^GSPC, but now that Yahoo’s date index is unix-based, the date index is since the epoch (1/1/1970).  I have not taken the trouble to determine how to have Python pandas-datareader bring in pre-epoch data. Please leave a comment if you know the solution or if I am imagining the issue.
  • Google Finance
    • Accessed from Python using pandas-datareader package version 0.4.1
    • PRO…recent OHLC data is more accurate than recent Yahoo OHLC data
    • CON…you have to adjust for dividends yourself
    • CON…data before 2002 is extremely unreliable
    • CON…Google Finance has a long-running problem with TLT missing data. It is still not fixed. I am sure there are other symbols that need to be fixed as well.
  • AlphaVantage
    • Accessed from Python using  alpha-vantage package 1.1.2
    • PRO…AV fixed its missing data issues with EEM and other symbols that I identified in my last post.
    • PRO…AV has added functionality to issue dividend-adjusted prices within the last few days
    • PRO…there is a recently-released Python package (alpha-vantage 1.1.2) that has been updated to take advantage of the recent AlphaVantage changes
    • CON…there is no way to specify a start date or end date to the daily timeseries
    • CON…the data only extends back to 2000.
  • Quandl
    • Accessed from Python using Quandl package 2.8.9
    • PRO…Stock data is maintained and is free.
    • CON…ETF data is not free.

Tiingo

I found this data source after reading the Logical-Invest blog. The data extends well before the year 2000, it has back-adjusted data, it has ETF data, and the Python interface is straightforward. Check out the documentation here and here. In the documentation, Tiingo discloses that they source data “from Quotemedia and Quandl among others”.

Below is a chart comparison of the difference between Quandl (Wiki – free data), AlphaVantage, and Google data for the stock of AT&T (symbol “T”). I compare the dividend-and-split adjusted data, as that it is the most prone to missing data and compounding errors. The Google dataset is Google close price data, adjusted with dividend data sourced from Yahoo Finance. I use Quandl as the baseline, but do not take that to mean that Quandl is reliable.

Comparison of Google Finance, Alpha Vantage and Tiingo

My only beef with Tiingo (and it is a small one) as of this writing is that the dividend-adjustment occurs with respect to the most recent business day, even if you specify an end-date sometime in the past. It would preferable if the dividend-adjusted data would adjust based on the last date in the user-requested timeseries, specified by the endDate argument.

 

Breaking Away

I do not have the time nor the resources to verify free data sources any further. If you want clean data and a clean interface, my recommendation is that you pay for it from a dedicated provider. However, for a free data source, with ETF data, the longest data history, and easy-to-integrate technology, Tiingo seems to be winning the horse race.

You may also like...