Why long term investment is good? You have heard of the phrase, The stock market always goes up. Invest for the long term. Just buy and hold in the long run. Is it really true that long term investment is more profitable? To test out this theory, I have decided to use the S&P 500 as a case study.

“Our favorite holding period is forever. We are just the opposite of those who hurry to sell and book profits when companies perform well but who tenaciously hang on to businesses that disappoint. Peter Lynch aptly likens such behavior to cutting the flowers and watering the weeds.”

– Berkshire Hathaway Letter to Shareholders, 1988

The end goal here is to create a simple heatmap that shows the 1-year return, 2-year return… 5-year return all the way until 30-year return. The timeframe of this study is 30 years, so there would be 30 columns in total. In essence, we want to find out how much you would have made if you invest in the S&P500 over x number of years.

This is the 3rd series of Python for Finance, the last one for January 2020. If you are new here, I have written one about Singapore Banks and another about Bitcoin and Gold. Do check them up if you are keen. All these are just for my own practice in Python. That is my 2020 new year resolution.

At the same time, it is also a good chance to explore what insights can be gathered in the area of investing. So hopefully this heatmap gives you a clearer view of why long term investment is good.

## Pulling S&P500 Data from Yahoo

To start off, the library and modules we would be using are Pandas, Numpy, Seaborn, Matplotlib and Pandas Data Reader. We would be pulling the closing price of the S&P500 beginning from 1990 all the way until today. When we run the SP500 variable, the output is in a series format. That is because it only has 1 column header.

## Converting the Data Type from Series into a Data Frame

To see what type our data is, we can simply use the `type()`

function. You can see that it is indeed in a series format. So our next step is to convert them into a data frame structure. This is because it is easier to work with data that is in a data frame structure rather than a series.

To change it, simply use the `.to_frame()`

. That is how a series datatype is converted into a data frame object. Then we use the `type()`

function again to check our new data type.

It has now changed from the series to the data frame type. Now we are ready.

## Calculating the Annual Log Returns of the S&P500

Currently, the data records of the S&P500 data are in days. The next thing to do is to convert them from days into years. This makes it easier to calculate the annual returns. We can do this by using the `.resample()`

method and then set the parameters as “Annual”. Let’s see how the last 10 row records look like.

Done! The date column has been converted from daily into yearly.

Next, we will create a new column name and call it ‘**Return**‘. The returns are calculated by taking the current year closing divided by the previous year closing. We can take the previous year value by shifting the entire records down one row using the `.shift()`

method. Let’s see how the last 10 row records look like again.

Looks good so far. We now have a new Return column that shows us the yearly log returns. Note that these returns are exclusive of dividends. It is just the S&P 500 closing price change.

Okay. Now that we have the annual log returns, we can easily calculate the 2-year return, 5-year return, 10-year and etc. For example, if we want to calculate the 5-year return, simply take the average of the past 5 years’ annual returns. If we want to calculate n-year returns, then just take the average of the past n years.

## Creating the S&P500 Long-Term Investment Table

The last step is to piece this information across the entire 30-year time frame. We want to find what is the 1-year return all the way until the 30-year returns. Hence, using a loop function would be ideal. The first step is to create a list of number from 1 to 30 and store them under the variable `years`

.

In each loop iteration, we want the machine to do two things for us. The first is to create a new column for every year. Second is to find out the rolling average returns based on that particular year.

For example, let’s say the loop function is at its 10th iteration, then it would create a new column named 10Y. This column would store the rolling past 10 years’ average returns. (e.g. from 2010-2020, 2009-2019,2008-2018 and etc.)

By the end of the loop iteration, the data frame should have 30 rows and 30 columns. The 30 columns represent each year’s rolling returns and the 30 rows represent the 30 years from 1990 to 2020. Let’s run the variable `summary`

and see what we have got.

Here is a snapshot of how the last 10 rows would look like. You realize that there are some NaN data. This is because there are insufficient years to calculate the rolling average. Our data point is only from 1990 to 2020.

Let’s say the loop is at its first iteration, it would create a “30Y” column and calculate the past 30 years average return. 1990 to 2019 is 30 years. So it takes the average of the 30 years log-returns and that turns out to be 0.076156.

However, when the machine tries to calculate the next rolling 30 years, 1989 to 2018, it fails. That is because we don’t have any data in 1989. The same logic applies for the subsequent years. That’s the reason why there are Nans in the columns.

## Interpreting the Long-Term Investment Heatmap

Using the exact same summary table created above, we can convert it into a heatmap using Python’s Seaborn. A heatmap allows us to quickly visualize the long term investment gains and losses over the years. Let’s see how this looks.

BAM! Done! That is as simple as it gets. We have calculated the long term investment returns over the past 30 years in the S&P500 index. The picture might be a little small, so you have to zoom in to see the % returns.

Just from one quick glance, you would have noticed that a large portion of the heatmap looks green, especially towards the left. So how do we interpret this heatmap?

Towards the most right, that is the 1Y column. This shows the yearly gains or losses of the S&P500 for that particular year. For example, the S&P500 has gained 25.4% in 2019, lost 6.4% in 2018, gained 17.7% in 2017 and etc.

The 2Y column would be the average of the rolling two years. So 2Y 2020 returns would be the annual log return for 2020 + 2019 divided by two. 2Y 2019 returns would be the annual log return of 2019 + 2018 divided by two and so on. Subsequently, the 3Y column would be the average of the rolling three years. 30Y would be the rolling 30 years return.

So what is the maximum number of years that you would have lost money if you invested in the S&P500? The answer is 13 years.

If you invested your money at the beginning of 2000. You would still have lost 0.2% in 2012, after a long 13 years. Anything after 14 years is a positive gain. This means that if you had invested in the S&P500 in ANY year from 1990, you would be in the green if you held on for more than 14 years.

## Probability of a Market Crash

This is another interesting question to explore. Now that we have all the yearly log-returns of the S&P500, we can find out how frequently does it sink into the negative territory. To do this, I have used a simple bar chart to plot out the returns over a period of 30 years.

It happened only seven times. The S&P500 recorded a negative return only seven times in 30 years. Furthermore, two out of these seven years were negligible. 1994 was -1.5% and 2015 was 0.7%.

To really count those years that were worse, maybe -5% or more, it is only five times. This means that the probability that you are going to see the S&P500 going down more than 5% in a particular year is only 16%. Wondering when would the next one come? **Do note that the statistics are based on a 30-year timeframe. *

## S&P 500 Returns in Dollar Value

What if we want to see the values in dollar terms rather than returns?

One of the advantages of using log-returns is that it allows us to calculate today’s stock prices. This is not possible if we use the simple returns method. Here is a simple illustration of the differences.

Let’s say the stock price starts off at $100. It fluctuates up and down and the closing price as of 2020 is 95. If we use the simple return method, the average returns would be 9.4%. However, that is not really accurate as we lost $5, but it is telling us that the average return is 9.4%.

Hence, we can’t use this information and calculate future prices as there is no guarantee that it would be 9.4% every year. In fact, if we try doing so, the output turns out to be 131, totally off from the actual price of 95. That is one of the pitfalls of using simple returns.

On the other hand, using log returns eliminates this problem. The average log return tells us that we have lost money and if we try calculating the price, it will be exactly the same.

Keeping this in mind, we can use the log-returns from the heatmap to generate another version that is in dollar terms. This is because the log-returns allow us to calculate today’s price if we start off from an initial capital of $100.

We just have to change the functions inside each loop iterations. Instead of calculating the rolling average returns, we want to calculate how much our $100 is worth. The formula would be $100 * e to the power of (no. of years * average log returns). Let’s see how this looks.

Done! The log-returns heatmap has been translated to dollar values. It is showing how much your $100 is worth if you hold on for x number of years.

## Long Term Investment or Short Term Investment?

You can see that if we look at the 1Y column on the right, the stock market fluctuates up and down. It is more volatile. There are years when it is red, orange and there are different shades of green. The same goes for 2Y, 3Y or 5Y.

But as the time horizon increases to 10Y and above, everything starts turning into solid green. It becomes almost certain that your losses would turn to gains. Time is your advantage when it comes to long term investment.

Even if you invested at the peak before the crisis, you would still make back your money eventually in the long term. Though it is a slow way to make money, and an extremely inefficient one. But at least you are assured that you did not lose money. That is if you hold long enough without buying and selling in between the years.

Moral of the story? Start investing as early as possible. The time is your friend. Long term investment increases your probability of making money. Just ask yourself, do you want your portfolio to be on the right or the left side of the chart. Then look at the x-axis which shows the years it takes to hold your investments. You will realize that the longer you hold, the greener it gets.

**Do take note again of the assumptions used in this study. It is the S&P500 index and the timeframe is 30 years. Other indexes or individual stocks might behave differently from the results shown above.* *Please do your own due diligence* *when it comes to investing.*

Hi.

I NVR really like looking at this kind of way to show tat invest 30 yrs will.earn a lot.

Fees?

Etf annual fees?

Number of units reduce mth annually due to cost payments

Dca?

Too many variable.

There is no profit at 10 year 5 year or 20 year… Until u sell.

Or receive distributions.

Personally for me. I can’t invest only to see and touch returns 30nyr later.

Even Babylonian story doesn’t reap his benefit after 30 year but ongoing.

I think tat turns ppl off from investing.

Fair point there. This excludes all the fees, DCA assumptions and variables that you mentioned. It is just a simple illustration to show the returns over x period of years. Definitely not meant to be a strategy of holding 30-40 years and enjoying the wealth at a later stage of life.

@Rokawa What is your strategy then, for beating the S&P?

@Rokawa What is your strategy then, for beating the S&P?