mathjax + gtag

Tuesday, September 23, 2014

Exponential Moving Averages on Google Sheets

Abstract

Exponential moving averages (EMA) is a way to remove noise from data series. Unfortunately, useful straightforward spreadsheet support for EMA is absent. This article examines the problems involved and proposes a one line formula solution to add EMA to a spreadsheet.

Keywords

Exponential Moving Average, EMA, Noise Removal, Filtering, Google Sheets, Spreadsheets.

Introduction

Traditionally, implementing an exponential moving average has been done in spreadsheets using a recursive formula, i.e., an auto-regressive (AR) process or an infinite impulse response (IIR) filter. The formula is the following:
\[y(n) = \alpha \cdot x(n) + (1 - \alpha) \cdot y(n-1)\]
There are some problems with this approach:
  1. Strictly speaking, this is not a moving average. Moving average assumes a finite length sliding window under which data is being weighted. There is a growing window, not a sliding window.
  2. The last item implies that the average takes into account every single data sample. The previous formula actually implements an IIR filter. This kind of averaging never forgets a value, although old values certainly get irrelevant over time. It would be nice to have control over the window where the exponential average takes place, especially when one intends to use a small window.
  3. It does not deal properly with missing values. In a spreadsheet, it is common to have missing data values and the previous approach does not allow them. Converting missing values to zero would unacceptably distort the average value. If one insists in applying this formula to the series by packing the original series, the weights applied to the values in the averaging process will not reflect the actual distance in time that these samples might have.
The function AVERAGE(range) is able to deal with this problem quite simply because it uses the same average on every sample, so it is just a matter of dividing the sum by the number of non-blank entries. In a non-uniform averaging like EMA, we need to keep track of which weights were really applied or not due to missing values, and fix the normalizing factor accordingly.

Proposed solution

The function SERIESSUM(a, n, m, x) is defined as
\[SERIESSUM(a, n, m, x) = \sum_{i=0}^n x_i a^{n+m i}.\]
The proposed solution is to use this function twice, first to calculate the weighted sum, and then a second time to calculate the sum of the weights where data is not missing.
Assume that:
  • Cell F1 contains the geometric progression ratio \((\alpha)\);
  • Cell F2 contains the window size \((n)\);
  • Column B contains the raw data;
  • The current cell is B22.
Then the claim is that the following formula will calculate the correct value of EMA:
\[\begin{array}{l}
=\\
SERIESSUM($F$1, $F$2, -1, \\
\qquad ARRAYFORMULA(N(OFFSET(B22, -$F$2 + 1, 0, $F$2, 1))))\\
/ \\
SERIESSUM($F$1,  $F$2, -1, \\
\qquad ARRAYFORMULA(\\
\qquad \qquad N(ISNUMBER(OFFSET(B23, -$F$2 + 1, 0, $F$2, 1)))))
\end{array}\]
The details of this expression are as follows:
  • \(OFFSET(current\_cell, -n+1, 0, n, 1)\) is used to produce a range of \(n\) cells, of which the current cell is the last one.
  • \(ARRAYFORMULA(N(OFFSET(\dotsc)))\) will apply the \(N()\) function to each element of the argument range to generate a new range with zero values in the missing data cells. Without this trick, \(SERIESSUM()\) would use non-missing values as if they were contiguous.
  • \(ARRAYFORMULA(N(ISNUMBER(OFFSET(\dotsc))))\) will generate a range composed of ones where data is not missing and zeros where data is missing.

Results

The formula has been tested against some weighting data. The resulting spreadsheet has a plot of the original data, along with the \(AVERAGE()\) data and EMA data for comparison.

Conclusion

A spreadsheet formula for the correct calculation of an exponential moving average has been derived and successfully tested on Google Sheets. The proposed formula deals with missing values in a way similar to the \(AVERAGE()\) function, avoiding the distortions that would be caused either by using zero in place of the missing values or by packing the original series.

13 comments:

  1. Thank you! I was about to do the same spreadsheet. I suppose you read John Walker too?

    ReplyDelete
    Replies
    1. Hi Savio,

      I did read it, indeed, but the diet I used to get those results is not what he advocates. I did not count calories, I just cut out all carbs and went ketogenic!

      But the exponential moving average is a great tool, it captures details that the pure average does not.

      Regards!

      Delete
  2. I have recently started a blog, the info you provide on this site has helped me greatly. Thanks for all of your time & work. moving companies pasadena

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. I am using this formula to calculate EMA for Financial Instruments. BUT there is a problem.

    The geometrical progression ratio is not a constant. It depends on the period length:

    alpha = 2 * (n + 1)

    n is the period. In my case n is the number of days (9, 20, 50, 200).

    When n = 200, the multiplier alpha = 0.009950248756

    In this case, the Series Sum have a wrong return.

    I am using the formula on Google Spreadsheet

    Mathematical Reference: https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average

    If I am right considering this a problem, how could it be managed?

    Kind regards

    G. Aloe

    ReplyDelete
    Replies
    1. I have forgotten 1 detail. This is how I am using your formula by the use of GoogleFinance() in Arrayformula and not Offset on cells.

      =SERIESSUM($G$1, $G$2-1, -1, ARRAYFORMULA(N(QUERY(GoogleFinance( L22, "price", WORKDAY( TODAY(), -$G$2), $G$2), "select Col2 label Col2 ''", 1)))) / SERIESSUM($G$1, $G$2-1, -1, ARRAYFORMULA(N(ISNUMBER(QUERY(GoogleFinance( L22, "price", WORKDAY( TODAY(), -$G$2), $G$2), "select Col2 label Col2 ''", 1)))))

      Delete
    2. This comment has been removed by the author.

      Delete
  5. One more thing. You have done a great job with the arrayformula to calculate EMA by Sum of Series.

    I am in struggle to find a way to calculate the DOUBLE Exponential Moving Average (DEMA) on Google Spreadsheet.

    The reason is that I am still not skilled enough on how to store the Series of n EMAs to use to calculate the EMAn(EMAn)

    Here the formula:

    DEMAn = 2∗EMAn - EMAn(EMAn)

    where:
    n=look-back period

    Here the Mathematical Reference: https://www.investopedia.com/terms/d/double-exponential-moving-average.asp

    Maybe you can do much better than me in the finding of a solution that can be used on Google Sheets, making the full calculation in only 1 cell OR developed by partial calculations that are all in the same row.

    Thank you very much for your support and for this post that helped me.

    Kind regards

    G. Aloe

    ReplyDelete
  6. Can you explain what value stored in F1 and F2 and how the values are derived , please.

    ReplyDelete
  7. Hi, sorry for the question, I'm a total novice. I'm trying to create a spreadsheet that pulls historical price data and calculate the EMA values based on different time periods (10day EMA, 15day EMA, 50day EMA). I see that your example uses a list of values but is there a simpler way to use "array_constrain" to pull the data via "GoogleFinance" so the formula can be used in a single line without building a dataset?

    An SMA formula uses the following for a 50day (originally meant for a 200day): =average(index(array_constrain(sort(GoogleFinance(B1,"price",workday(today(),-220),today()),1,false),51,2),"",2))

    ReplyDelete
  8. Good website! I truly love how it is easy on my eyes it is. I am wondering how I might be notified whenever a new post has been made. I have subscribed to your RSS which may do the trick? Have a great day! fischervanlines.com

    ReplyDelete