mathjax

Tuesday, September 23, 2014

Exponential Moving Averages on Google Sheets

Abstract

Exponential moving averages (EMA) is a way to remove noise from data series. Unfortunately, useful straightforward spreadsheet support for EMA is absent. This article examines the problems involved and proposes a one line formula solution to add EMA to a spreadsheet.

Keywords

Exponential Moving Average, EMA, Noise Removal, Filtering, Google Sheets, Spreadsheets.

Introduction

Traditionally, implementing an exponential moving average has been done in spreadsheets using a recursive formula, i.e., an auto-regressive (AR) process or an infinite impulse response (IIR) filter. The formula is the following:
\[y(n) = \alpha \cdot x(n) + (1 - \alpha) \cdot y(n-1)\]
There are some problems with this approach:
  1. Strictly speaking, this is not a moving average. Moving average assumes a finite length sliding window under which data is being weighted. There is a growing window, not a sliding window.
  2. The last item implies that the average takes into account every single data sample. The previous formula actually implements an IIR filter. This kind of averaging never forgets a value, although old values certainly get irrelevant over time. It would be nice to have control over the window where the exponential average takes place, especially when one intends to use a small window.
  3. It does not deal properly with missing values. In a spreadsheet, it is common to have missing data values and the previous approach does not allow them. Converting missing values to zero would unacceptably distort the average value. If one insists in applying this formula to the series by packing the original series, the weights applied to the values in the averaging process will not reflect the actual distance in time that these samples might have.
The function AVERAGE(range) is able to deal with this problem quite simply because it uses the same average on every sample, so it is just a matter of dividing the sum by the number of non-blank entries. In a non-uniform averaging like EMA, we need to keep track of which weights were really applied or not due to missing values, and fix the normalizing factor accordingly.

Proposed solution

The function SERIESSUM(a, n, m, x) is defined as
\[SERIESSUM(a, n, m, x) = \sum_{i=0}^n x_i a^{n+m i}.\]
The proposed solution is to use this function twice, first to calculate the weighted sum, and then a second time to calculate the sum of the weights where data is not missing.
Assume that:
  • Cell F1 contains the geometric progression ratio \((\alpha)\);
  • Cell F2 contains the window size \((n)\);
  • Column B contains the raw data;
  • The current cell is B22.
Then the claim is that the following formula will calculate the correct value of EMA:
\[\begin{array}{l}
=\\
SERIESSUM($F$1, $F$2, -1, \\
\qquad ARRAYFORMULA(N(OFFSET(B22, -$F$2 + 1, 0, $F$2, 1))))\\
/ \\
SERIESSUM($F$1,  $F$2, -1, \\
\qquad ARRAYFORMULA(\\
\qquad \qquad N(ISNUMBER(OFFSET(B23, -$F$2 + 1, 0, $F$2, 1)))))
\end{array}\]
The details of this expression are as follows:
  • \(OFFSET(current\_cell, -n+1, 0, n, 1)\) is used to produce a range of \(n\) cells, of which the current cell is the last one.
  • \(ARRAYFORMULA(N(OFFSET(\dotsc)))\) will apply the \(N()\) function to each element of the argument range to generate a new range with zero values in the missing data cells. Without this trick, \(SERIESSUM()\) would use non-missing values as if they were contiguous.
  • \(ARRAYFORMULA(N(ISNUMBER(OFFSET(\dotsc))))\) will generate a range composed of ones where data is not missing and zeros where data is missing.

Results

The formula has been tested against some weighting data. The resulting spreadsheet has a plot of the original data, along with the \(AVERAGE()\) data and EMA data for comparison.

Conclusion

A spreadsheet formula for the correct calculation of an exponential moving average has been derived and successfully tested on Google Sheets. The proposed formula deals with missing values in a way similar to the \(AVERAGE()\) function, avoiding the distortions that would be caused either by using zero in place of the missing values or by packing the original series.

3 comments:

  1. Thank you! I was about to do the same spreadsheet. I suppose you read John Walker too?

    ReplyDelete
    Replies
    1. Hi Savio,

      I did read it, indeed, but the diet I used to get those results is not what he advocates. I did not count calories, I just cut out all carbs and went ketogenic!

      But the exponential moving average is a great tool, it captures details that the pure average does not.

      Regards!

      Delete