Custom S&P 500 daily total return index in Python
I looked all over the web for an S&P 500 daily total return dataset this weekend. Yahoo has one that starts in 1988, but I wanted something that starts earlier, so I can backtest through the ’87 crash. Apparently there are datasets floating around, but they are not easy to come by. The farther back the dataset, the more likely it is that a total return dataset is monthly. I am really surprised these are not on Quandl!
So instead, I used Python to code up a daily total return index. This snippet uses Yahoo’s daily price-only S&P 500 index, ^GSPC, which goes back to 1950. It will allow you to create a custom index starting on any date. It uses Shiller’s S&P 500 dividend data, and assumes daily compounding of the dividends. Depending on your point of view, these may be incorrect assumptions, so caveat emptor! It is an approximation.
import pandas_datareader as web import pandas as pd def sp500totalreturn(startdate=None): # Get S&P 500 Return & Dividend data spindx = web.DataReader('^GSPC','yahoo','1/1/1950')['Close'] # Price-only spdvds = pd.read_excel('http://www.econ.yale.edu/~shiller/data/ie_data.xls','Data',header=7) # Spruce up the dividend datatable spdvds = spdvds.ix[spdvds[spdvds['Date']==1950.01].index[0]:,:] spdvds = spdvds.ix[:spdvds[spdvds['Date']==spdvds.Date.dropna().iloc[-1]].index[0],:] yr = np.floor(spdvds.Date) mn = (spdvds.Date - yr)*100 spdvds['Date'] = pd.to_datetime(dict(year=yr,month=mn,day=1)) spdvds = spdvds.set_index('Date') # Calculate total return multiplier dvd = (spdvds['D']/spdvds['P']).dropna()/252 ret = spindx/spindx.shift(1)-1 tot = pd.concat([dvd,ret],axis=1) tot = tot.ffill().rename(columns={0:'D'}) tot = tot.ix[ret.index,:] tot['D'] = tot['D'].shift(1) tot = tot.sum(axis=1) tot = (1+tot).cumprod() # Start index on specified date if startdate: startdate = pd.to_datetime(startdate) else: startdate = tot.index[0] startvalue = spindx[startdate] tot = tot.truncate(startdate) tot = tot/tot.ix[0]*startvalue return tot
I checked this against Yahoo’s 1988-starting daily total return index, and the results were really close. There is only a 1.5% difference between my ending value on April 21st and Yahoo’s on April 21st after nearly 30 years of compounding. You can check it yourself!
# Confirm it approaches Yahoo's 1988-starting daily total return index sp88tr = web.DataReader('^sp500TR','yahoo','1/1/1988')['Close'] # Total return start = sp88tr.index[0] # Get first date of Yahoo's '88 based index test = sp500totalreturn(start) test.plot() sp88tr.plot() plt.title('Comparison between custom total return index\nand Yahoo ^SP500TR') plt.legend(['Custom Index','^SP500TR'])
Breaking Away
This will be useful in the future for backtesting tactical asset allocation methodologies on S&P data going back farther than 1988. Expect more on this topic.
I am sure there are going to be quibbles with this code. If you have one, please be sure to provide a better solution. I will use it to revise this post. Again, this is an approximation, so base no investment decisions on this post!