Custom S&P 500 daily total return index in Python

I looked all over the web for an S&P 500  daily total return dataset this weekend. Yahoo has one that starts in 1988, but I wanted something that starts earlier, so I can backtest through the ’87 crash. Apparently there are datasets floating around, but they are not easy to come by. The farther back the dataset, the more likely it is that a total return dataset is monthly. I am really surprised these are not on Quandl!

So instead, I used Python to code up a daily total return index. This snippet uses Yahoo’s daily price-only S&P 500 index, ^GSPC, which goes back to 1950. It will allow you to create a custom index starting on any date. It uses Shiller’s S&P 500 dividend data, and assumes daily compounding of the dividends.  Depending on your point of view, these may be incorrect assumptions, so caveat emptor!  It is an approximation.


import pandas_datareader as web
import pandas as pd

def sp500totalreturn(startdate=None):
   # Get S&P 500 Return & Dividend data
   spindx = web.DataReader('^GSPC','yahoo','1/1/1950')['Close'] # Price-only
   spdvds = pd.read_excel('http://www.econ.yale.edu/~shiller/data/ie_data.xls','Data',header=7)

   # Spruce up the dividend datatable
   spdvds = spdvds.ix[spdvds[spdvds['Date']==1950.01].index[0]:,:]
   spdvds = spdvds.ix[:spdvds[spdvds['Date']==spdvds.Date.dropna().iloc[-1]].index[0],:]
   yr = np.floor(spdvds.Date)
   mn = (spdvds.Date - yr)*100
   spdvds['Date'] = pd.to_datetime(dict(year=yr,month=mn,day=1))
   spdvds = spdvds.set_index('Date')

   # Calculate total return multiplier
   dvd = (spdvds['D']/spdvds['P']).dropna()/252
   ret = spindx/spindx.shift(1)-1
   tot = pd.concat([dvd,ret],axis=1)
   tot = tot.ffill().rename(columns={0:'D'})
   tot = tot.ix[ret.index,:]
   tot['D'] = tot['D'].shift(1)
   tot = tot.sum(axis=1)
   tot = (1+tot).cumprod()

   # Start index on specified date
   if startdate:
      startdate = pd.to_datetime(startdate)
   else:
      startdate = tot.index[0]
   startvalue = spindx[startdate]
   tot = tot.truncate(startdate)
   tot = tot/tot.ix[0]*startvalue
   return tot

I checked this against Yahoo’s 1988-starting daily total return index, and the results were really close. There is only a 1.5% difference between my ending value on April 21st and Yahoo’s on April 21st after nearly 30 years of compounding.  You can check it yourself!


# Confirm it approaches Yahoo's 1988-starting daily total return index
sp88tr = web.DataReader('^sp500TR','yahoo','1/1/1988')['Close'] # Total return
start = sp88tr.index[0] # Get first date of Yahoo's '88 based index
test = sp500totalreturn(start)
test.plot()
sp88tr.plot()
plt.title('Comparison between custom total return index\nand Yahoo ^SP500TR')
plt.legend(['Custom Index','^SP500TR']) 

Breaking Away

This will be useful in the future for backtesting tactical asset allocation methodologies on S&P data going back farther than 1988.  Expect more on this topic.

I am sure there are going to be quibbles with this code. If you have one, please be sure to provide a better solution. I will use it to revise this post. Again, this is an approximation, so base no investment decisions on this post!

You may also like...