# Preserving Memory in Stationary Time Series

Many predictive models require a certain consistency of time series called stationarity. The usual transformation, namely integer order differencing (in Finance e.g. modelling returns instead of absolute prices), eliminates memory in the data and hence affects the predictive power of the modelling. This article outlines how fractional calculus allows to retain more information and to better balance stationarity and meaningful memory.

In general, we understand a given time series as a sample generated by a stochastic process whose distribution and statistics we try to infer for a predictive model.

Building predictive models of stochastic processes is about finding a balance between specificity and generalisability of the samples: the model interprets a given series against the backdrop of general patterns.

More specific than a general predictive regression, a time series comes with inherent ordering due to its temporal structure — any given instance reflects a history of values traversed in the past, the specific memory of its past track record.

Stationarity

In order to identify a generic pattern of the generating process and map the given constellation, this series-specific memory is often eliminated as part of a preprocessing step before the actual modelling.

In the Machine Learning parlance of supervised learning, this serves to discover generic structure and match the given instance to more samples in the labelled training set.

In mathematical terms, the statistical properties of the process and hence the ensemble of series, such as mean, variance and covariance, should be invariant with respect to the time ordering, meaning the series should not exhibit a trend over time. This notion is called stationarity (see e.g. [1] for a thorough explanation).

There are various ways to check a series for stationarity:

1. visually inspect the line plots over time for a pronounced trend, see e.g. [6] for some illustrative examples.
2. compare basic summary statistics (mean, variance, covariance) for various (random) splits of the series.
3. inspect the autocorrelation plot: the faster the curve drops for increasing lags the less the order of non-stationarity in the series.
4. the most common statistical test for stationarity is the Augmented Dickey Fuller (ADF) test for unit roots.

Intuitively, the implication of a unit root, formally a solution of the characteristic equation of the process lying on the unit circle (see [2]), is that initial conditions or external shocks do not dissipate over time, but propagate through the series and inform all subsequent values.

For a given confidence level, ADF tests the null-hypothesis ‘existence of a unit root of some order’ (implying non-stationarity of the time series) against the alternative of stationarity (or, strictly speaking, trend-stationarity).
It is straight-forward to convince oneself that the existence of a unit root indeed implies non-stationarity of the series:

so the variance turns out to be time dependent.

Since a necessary assumption for many classical model approaches is the stationarity of a time series: when you have a clear trend or seasonality in your data, you would remove it and model the remainder. For a prediction, you then combine the (deterministic) trend and the model output.

The commonly used transformation to make a series stationary is differencing up to some order: first order differencing is simply subtracting from each value the preceding one (extracting the rate of change). Second order differencing repeats this process for the resulting series and so on for higher orders. Eg in financial time series, you would consider (log) returns instead of absolute prices to make the model agnostic to the specific price level (in fact, for most financial sequences, first order differencing suffices to ensure stationarity — I still wonder why).

In Arima(p,d,q)-type models based on autocorrelation the differencing is actually part of the algorithm: the parameters p, dand q are non-negative integers, with p denoting the order (i.e. number of time lags) of the autoregressive model, d the order of differencing and q the order of the moving-average model.

This procedure surgically eliminates the unit roots in the series.

However, on the other hand, it wipes out memory which is the basis for the predictive power of the model: differencing restricts how past information propagates through the series.

E.g. for financial series, one is faced with the dilemma between a stationary series of returns (first order differencing) with no memory and a series of absolute prices (differencing of order zero) with memory but which is not stationary.

But maybe there is no need for this polarity? What if we could interpolate between these two extremes?

“… It will lead to a paradox, from which one day useful consequences will be drawn.” Leibniz, 1695

### Fractional Calculus

In fact, we can: shortly after the invention of calculus by Newton and Leibniz in the 17th century, mathematicians explored the use of fractional derivatives, where the order of differentiation or integration is extended from natural numbers to real numbers. It should however take the work of Hurst and Mandelbrot in the 20th century for fractional calculus to find its first natural applications and to make its entry in finance with Hosking and Granger’s Arfima models in the 1980ies (see [4] for a comprehensive historical account).

The historically first heuristics for this generalisation seems to be given by Euler around 1730 by generalising binomial coefficients via the Gamma function to real orders. This was later made more rigorous by the Cauchy formula for repeated differentiation and in the integration theory of Riemann and Liouville (see e.g. [3]).

Here, we just give the formal heuristics for our application to differencing of time series:
let B denote the Lag operator, i.e. B X_t=X_{t-1} for t>1 and some time series X={X_1,…}. Element-wise differencing of first order can then be expressed with the identity operator I as

Polynomials of this operator are understood as repeated application, e.g. B² X_{t}=X_{t-2}. We can expand the series using Binomial coefficients:

While the case can be made more rigorous for d ∈ ℜ, here, since we will only use a cut off version of this expansion in our applications below, we actually do not have to bother about convergence of this formal derivation.

From this derivation, we can read out the iterative formula for the weights of the lags

where ω_k is the coefficient of lag operator B^k.
For example, for returns, ω_0 = 1, ω_1 = -1 and ω_k=0 for k>1 (first order differencing).

Let’s have a look at those coefficients for various orders of differencing (code below).

We notice two important specifics of fractional differencing:

• the lag weights equal zero for any integer d ∈ 𝒩 with d<k: this means that we recover the usual derivation for integer orders.
• the lag weights become asymptotically small for real orders d and large lags. This phenomenon is referred to as ‘long memory’ (or ‘non-locality’ in the context of calculus) and usually requires boundary conditions. Here we simply choose to cut off the (small) weights beyond a certain window size.

### Applications to Modelling of Financial Time Series

A field where time series play a dominant role is finance. To get a better intuition for the characteristics of fractional differencing, let’s apply it to some typical financial time series.

We obtain the transformed series by applying above formal series expansion of the differencing operator to a time series for a specified real order d∈ℜ and a fixed window size — using below code, simply feeding a pandas time series into the function ts_differencing with parameters order and lag_cutoff.

Bitcoin prices 2016–18 (in red, right axis) along with some fractional derivatives (shades of blue).

As you may have noticed, the prices of Bitcoin have undergone a pronounced hype in 2017 and 2018 (red curve in above figure). Indeed, looking at the first order differencing, we see that prices have jumped by more than \$2500 on some single days (e.g. early Dec’17). This plot demonstrates the smooth functional interpolation for some fractional orders of differencing. It may be surprising that, even for such strong trends, weak differencing of about order 0.4 is actually enough to make the series stationary: the ADF statistic of -5 for the given sample is already lower than the critical value -2.86 of the DF t-distribution, so with 95% confidence the series can be assumed stationary. Indeed, as riches come and go, it was mean-reverting soon after.

The finding that low orders of differencing suffice for stationarity is similar for many other financial time series.

To illustrate the trade-off between stationarity and memory, we can follow a suitable visualisation from [7] and plot the ADF test statistics along with the (linear) correlation to the original series for a series transformed with various orders of differencing. (Note that a lower ADF test statistics indicates a higher degree of likelihood for the alternative of the test, i.e. the more negative the value the more likely we can reject the null-hypothesis and assume stationarity).

For various typical financial series such as (rolled) commodity futures, exchange rates and indices, this comparison clearly illustrates that (low) fractional orders of differencing satisfy the stationarity condition for Financial Modelling while preserving the specific memory structure and hence the statistical dynamics of the original serie

ADF test statistics (left axis, curve in red) and linear correlation (right axis, curve in cornflower blue) with the original series for various fractional orders of differencing, applied to various Financial time series. The constant line in slate grey marks the 95% significance level of the ADF test.

### Conclusions

It is highly surprising that, about a generation after the introduction of Arfima models, the concept of fractional differencing has seemingly not gained wide-spread traction in Finance. To an extent that what many may attribute to the ‘efficiency of markets’ might not be but an artefact of the voluntary cancellation of information by unsuitable data preprocessing.

I hope this article makes a compelling point to include this effective technique into your modelling toolkit.

The author: A passionate data scientist, I have worked as the tech lead for startups across the globe and implemented real-life AI solutions for the last four years. Contact me at simon’at’deepprojects.de.

References:

[1] Wikipedia, <a href=”https://en.wikipedia.org/wiki/Stationary_process&#8221; data-href=”https://en.wikipedia.org/wiki/Stationary_process&#8221; class=”markup–anchor markup–p-anchor” rel=”noopener noreferrer” target=”_blank” style=”background-color: transparent; color: inherit; text-decoration: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0.541176); background-image: url(“data:image/svg+xml; utf8, “); background-size: 1px 1px; background-position: 0px calc(1em + 1px); background-repeat: repeat no-repeat”>Stationary Process.

[2] Wikipedia, <a href=”https://en.wikipedia.org/wiki/Unit_root&#8221; data-href=”https://en.wikipedia.org/wiki/Unit_root&#8221; class=”markup–anchor markup–p-anchor” rel=”noopener noreferrer” target=”_blank” style=”background-color: transparent; color: inherit; text-decoration: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0.541176); background-image: url(“data:image/svg+xml; utf8, “); background-size: 1px 1px; background-position: 0px calc(1em + 1px); background-repeat: repeat no-repeat”>Unit Root.

[3] Wikipedia, <a href=”https://en.wikipedia.org/wiki/Fractional_calculus&#8221; data-href=”https://en.wikipedia.org/wiki/Fractional_calculus&#8221; class=”markup–anchor markup–p-anchor” rel=”noopener noreferrer” target=”_blank” style=”background-color: transparent; color: inherit; text-decoration: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0.541176); background-image: url(“data:image/svg+xml; utf8, “); background-size: 1px 1px; background-position: 0px calc(1em + 1px); background-repeat: repeat no-repeat”>Fractional Calculus.

[4] Grave, Gramacy, Watkins and Franzke, <a href=”https://www.mdpi.com/1099-4300/19/9/437&#8243; data-href=”https://www.mdpi.com/1099-4300/19/9/437&#8243; class=”markup–anchor markup–p-anchor” rel=”noopener noreferrer” target=”_blank” style=”background-color: transparent; color: inherit; text-decoration: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0.541176); background-image: url(“data:image/svg+xml; utf8, “); background-size: 1px 1px; background-position: 0px calc(1em + 1px); background-repeat: repeat no-repeat”>A Brief History of Long Memory: Hurst, Mandelbrot and the Road to ARFIMA, 1951–1980.

[5] Wikipedia, <a href=”https://en.wikipedia.org/wiki/Autoregressive_fractionally_integrated_moving_average&#8221; data-href=”https://en.wikipedia.org/wiki/Autoregressive_fractionally_integrated_moving_average&#8221; class=”markup–anchor markup–p-anchor” rel=”noopener noreferrer” target=”_blank” style=”background-color: transparent; color: inherit; text-decoration: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0.541176); background-image: url(“data:image/svg+xml; utf8, “); background-size: 1px 1px; background-position: 0px calc(1em + 1px); background-repeat: repeat no-repeat”>Arfima.

[6] Analytics Vidhya, <a href=”https://www.analyticsvidhya.com/blog/2018/09/non-stationary-time-series-python/&#8221; data-href=”https://www.analyticsvidhya.com/blog/2018/09/non-stationary-time-series-python/&#8221; class=”markup–anchor markup–p-anchor” rel=”noopener noreferrer” target=”_blank” style=”background-color: transparent; color: inherit; text-decoration: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0.541176); background-image: url(“data:image/svg+xml; utf8, “); background-size: 1px 1px; background-position: 0px calc(1em + 1px); background-repeat: repeat no-repeat”>Non-Stationary Time Series.