The purpose of this article is short and simple. Is it possible to find out what is the average historical stock market returns by month? Can we explore the seasonality factor? Is there a best or worst month to invest in the stock market? Let’s explore this using Python.

It’s been a long time since I touched Python. My last Python post was 4 months ago in February. Just like formal education, I think I have almost forgotten everything. But luckily I still have some archives in my blog to refer to. Come to think of it, the Python posts which I have written in the past are actually references to teach my future self today. If not I don’t know how I am going to do this all over again.

Anyways, Let’s jump right into it. You can also do this on your own if you have Anaconda & Jupyter installed. It is really simple as it’s just a couple lines of codes. Then you can use it to see which month to invest in your favourite stock or something.

## 1. Pulling Historical Stock Data from Yahoo Finance

```
import pandas as pd
import numpy as np
import seaborn as sns
import calendar
import matplotlib
import matplotlib.pyplot as plt
from pandas_datareader import data as wb
from matplotlib.ticker import FuncFormatter
plt.style.use('fivethirtyeight')
matplotlib.rcParams['figure.figsize'] = (20,10)
matplotlib.rcParams.update({'font.size': 24})
```

```
df = wb.DataReader('^GSPC', data_source='yahoo', start='1975-1-1')['Close']
df = df.to_frame()
```

The first step is always to import all the libraries like pandas, numpy and etc. Then you can extract yahoo data in a one-liner code as shown in the variable df. I am taking the closing index price of the S&P 500 starting from 1975 till today. I chose 1975 as that is as far as the data is available. So we are talking about 45 years of data from 1975 to 2020. Let’s see what interesting insights can we find.

If we run df, it would show you the S&P500 closing price of every day. However, since we want to find out what is the average monthly return, then we only need the closing price on the last day of each month.

`df = df.resample('M').last()`

The code for this is simply to use the resample function, choose monthly and take the last data point.

Now you only get the closing index price of every month, which is what we want.

The next step is to find out what is the month-to-month % change and we also need the month number. This is to later regroup them by month so that we can find out what is the average return of each month.

```
df['Monthly_Returns'] = df['Close'].pct_change(1)
df['month']=pd.to_datetime(df.index).month
```

To do that, I have created two new columns. One to calculate the monthly returns and one to store the month number. Now you will have something that looks like this.

Finally, I also want to find out what is the average gains and losses for each month. Of all the monthly losses throughout the 45 years, which month has the highest loss. And of all the monthly gains, which month gains the most?

```
df['losses'] = np.where(df['Monthly_Returns']<0, df['Monthly_Returns'],0)
df['gains'] = np.where(df['Monthly_Returns']>0, df['Monthly_Returns'],0)
```

To do this, I have created another two new columns called “losses” and “gains”. It would be like an if-else statement. If the value is negative, then the “losses column” would be that value, else 0. Likewise, if the value is positive, then the “gains column” would be that value, else it would be 0. Here is what we have now.

```
df.drop(columns='Close', inplace=True)
df = df.replace(0, np.NaN)
```

Almost done. The last step is to clean up the data a little bit. We don’t need the Closing price columns and we need to replace the 0s with NA. This is because we want the average function to EXCLUDE the 0s.

`df = df.groupby(df['month']).mean()`

After cleaning up, let’s calculate the average monthly_returns, gains and losses for each month with the “group by” function above. This would group all the common months together and calculate the mean for each column. So you will have 12 rows (12 months) and 3 columns (monthly return, gains, losses) of average for each month.

## 2. Historical Monthly Stock Market Returns

Here is the visual bar chart of the S&P 500 historical monthly returns over the past 45 years.

I don’t know why people say sell in May and go away. There is still some meat on the table until July. It seems to me that September is the worst month rather than May. And the thing about Santa rally and new year rally from Nov to Jan seems quite accurate.

Based on the past 45 years of historical data, we can see that September is the time S&P500 performs the worst while April and November are the best months for the stock market. Meanwhile Feb and August are flat.

How about the average % gains and losses? Seems like August to October is also the time where the stock market falls the most. If you missed the recent rally, there might be opportunities soon? Hopefully?

Not sure how accurate this will be. I am also interested to find out how the stock market performs around Sep to Oct 2020. But statistically speaking, those are the bad months or rather “best time” to invest.

I went to look back at the S&P500 index around the Sep-Oct period for 2019 and 2018. Seems to have some big red candles during that period of time coincidentally.

I did the same analysis with the Dow Jones. You just need to change the symbol ticker from ^GSPC to ^DJI. That’s why I say you can follow along with the codes and replace it with whatever stock symbols you are interested in. For example, I did one with TSLA and the magic month is July. Hopefully that comes true.

Anyways, here are the results for Dow Jones.

For them, August and September is also the worst month. The best-performing months are year-end and April. Quite similar to what we see with the S&P500 index. After all, they are pretty much correlated.

Looks like there is indeed some seasonality factors involved. The historical stock market returns by month do show some recurring patterns. Certain months are bad and certain months are good.

The conclusion seems to be September is the worst month for the stock market. And Nov to Jan + April is the best month for the stock market. Maybe that’s why, “Sell in May and go Away”. It is probably to lock in profits from the Santa + New Year + April rally.

While I can understand the latter is due to pay bonuses and a festive mood, not sure what is the thing with September. Some of the explanations are portfolio rebalancing, harvesting tax losses, coming back from vacation and etc. We will see again in 3 months time. Maybe it might be the “2nd leg” market crash that everyone is waiting for?

## Leave a Reply