EE & CS: Exponential Moving Averages on Google Sheets

Tuesday, September 23, 2014

Exponential Moving Averages on Google Sheets

Abstract

Exponential moving averages (EMA) is a way to remove noise from data series. Unfortunately, useful straightforward spreadsheet support for EMA is absent. This article examines the problems involved and proposes a one line formula solution to add EMA to a spreadsheet.

Keywords

Exponential Moving Average, EMA, Noise Removal, Filtering, Google Sheets, Spreadsheets.

Introduction

Traditionally, implementing an exponential moving average has been done in spreadsheets using a recursive formula, i.e., an auto-regressive (AR) process or an infinite impulse response (IIR) filter. The formula is the following:
\[y(n) = \alpha \cdot x(n) + (1 - \alpha) \cdot y(n-1)\]
There are some problems with this approach:

Strictly speaking, this is not a moving average. Moving average assumes a finite length sliding window under which data is being weighted. There is a growing window, not a sliding window.
The last item implies that the average takes into account every single data sample. The previous formula actually implements an IIR filter. This kind of averaging never forgets a value, although old values certainly get irrelevant over time. It would be nice to have control over the window where the exponential average takes place, especially when one intends to use a small window.
It does not deal properly with missing values. In a spreadsheet, it is common to have missing data values and the previous approach does not allow them. Converting missing values to zero would unacceptably distort the average value. If one insists in applying this formula to the series by packing the original series, the weights applied to the values in the averaging process will not reflect the actual distance in time that these samples might have.

The function AVERAGE(range) is able to deal with this problem quite simply because it uses the same average on every sample, so it is just a matter of dividing the sum by the number of non-blank entries. In a non-uniform averaging like EMA, we need to keep track of which weights were really applied or not due to missing values, and fix the normalizing factor accordingly.

Proposed solution

The function SERIESSUM(a, n, m, x) is defined as
\[SERIESSUM(a, n, m, x) = \sum_{i=0}^n x_i a^{n+m i}.\]
The proposed solution is to use this function twice, first to calculate the weighted sum, and then a second time to calculate the sum of the weights where data is not missing.
Assume that:

Cell F1 contains the geometric progression ratio $(\alpha)$;
Cell F2 contains the window size $(n)$;
Column B contains the raw data;
The current cell is B22.

Then the claim is that the following formula will calculate the correct value of EMA:
\[\begin{array}{l}
=\\
SERIESSUM($F$1, $F$2, -1, \\
\qquad ARRAYFORMULA(N(OFFSET(B22, -$F$2 + 1, 0, $F$2, 1))))\\
/ \\
SERIESSUM($F$1, $F$2, -1, \\
\qquad ARRAYFORMULA(\\
\qquad \qquad N(ISNUMBER(OFFSET(B23, -$F$2 + 1, 0, $F$2, 1)))))
\end{array}\]
The details of this expression are as follows:

$OFFSET(current\_cell, -n+1, 0, n, 1)$ is used to produce a range of $n$ cells, of which the current cell is the last one.
$ARRAYFORMULA(N(OFFSET(\dotsc)))$ will apply the $N()$ function to each element of the argument range to generate a new range with zero values in the missing data cells. Without this trick, $SERIESSUM()$ would use non-missing values as if they were contiguous.
$ARRAYFORMULA(N(ISNUMBER(OFFSET(\dotsc))))$ will generate a range composed of ones where data is not missing and zeros where data is missing.

Results

The formula has been tested against some weighting data. The resulting spreadsheet has a plot of the original data, along with the $AVERAGE()$ data and EMA data for comparison.

Conclusion

A spreadsheet formula for the correct calculation of an exponential moving average has been derived and successfully tested on Google Sheets. The proposed formula deals with missing values in a way similar to the $AVERAGE()$ function, avoiding the distortions that would be caused either by using zero in place of the missing values or by packing the original series.

11 comments:

SavioFriday, March 4, 2016 at 7:47:00 PM GMT-3
Thank you! I was about to do the same spreadsheet. I suppose you read John Walker too?
ReplyDelete
Replies
Marcelo Roberto JimenezFriday, March 4, 2016 at 9:32:00 PM GMT-3
This comment has been removed by the author.
ReplyDelete
Replies
UnknownSaturday, June 8, 2019 at 12:47:00 PM GMT-3
This comment has been removed by the author.
ReplyDelete
Replies
UnknownSaturday, June 8, 2019 at 12:49:00 PM GMT-3
I am using this formula to calculate EMA for Financial Instruments. BUT there is a problem.

The geometrical progression ratio is not a constant. It depends on the period length:

alpha = 2 * (n + 1)

n is the period. In my case n is the number of days (9, 20, 50, 200).

When n = 200, the multiplier alpha = 0.009950248756

In this case, the Series Sum have a wrong return.

I am using the formula on Google Spreadsheet

Mathematical Reference: https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average

If I am right considering this a problem, how could it be managed?

Kind regards

G. Aloe
ReplyDelete
Replies
UnknownSaturday, June 8, 2019 at 1:06:00 PM GMT-3
One more thing. You have done a great job with the arrayformula to calculate EMA by Sum of Series.

I am in struggle to find a way to calculate the DOUBLE Exponential Moving Average (DEMA) on Google Spreadsheet.

The reason is that I am still not skilled enough on how to store the Series of n EMAs to use to calculate the EMAn(EMAn)

Here the formula:

DEMAn = 2∗EMAn - EMAn(EMAn)

where:
n=look-back period

Here the Mathematical Reference: https://www.investopedia.com/terms/d/double-exponential-moving-average.asp

Maybe you can do much better than me in the finding of a solution that can be used on Google Sheets, making the full calculation in only 1 cell OR developed by partial calculations that are all in the same row.

Thank you very much for your support and for this post that helped me.

Kind regards

G. Aloe
ReplyDelete
Replies
RodrigoacMonday, June 15, 2020 at 4:00:00 PM GMT-3
Nice tutorial and explanation.
ReplyDelete
Replies
EswarThursday, July 2, 2020 at 5:22:00 PM GMT-3
Can you explain what value stored in F1 and F2 and how the values are derived , please.
ReplyDelete
Replies
OniSaturday, May 22, 2021 at 2:30:00 PM GMT-3
Hi, sorry for the question, I'm a total novice. I'm trying to create a spreadsheet that pulls historical price data and calculate the EMA values based on different time periods (10day EMA, 15day EMA, 50day EMA). I see that your example uses a list of values but is there a simpler way to use "array_constrain" to pull the data via "GoogleFinance" so the formula can be used in a single line without building a dataset?

An SMA formula uses the following for a 50day (originally meant for a 200day): =average(index(array_constrain(sort(GoogleFinance(B1,"price",workday(today(),-220),today()),1,false),51,2),"",2))
ReplyDelete
Replies

Add comment

EE & CS

mathjax + gtag

Pages