Stock Price Change Forecasting with Time Series: SARIMAX (2024)

Last Updated on January 6, 2023 by Editorial Team

Author(s): Avishek Nag

Machine Learning, Statistics

High-level understanding of Time Series, stationarity, seasonality, forecasting, and modeling withSARIMAX

Stock Price Change Forecasting with Time Series: SARIMAX (1)

Time series modeling is the statistical study of sequential data (may be finite or infinite) dependent on time. Though we say time. But, time here may be a logical identifier. There may not be any physical time information in a time series data. In this article, we will discuss how to model a stock price change forecasting problem with time series and some of the concepts at a highlevel.

Problem Statement

We will take Dow Jones Index Dataset from UCI Machine Learning Repository. It contains stock price information over two quarters. Let’s explore the datasetfirst:

Stock Price Change Forecasting with Time Series: SARIMAX (2)

We can see only a few attributes. But there are other ones also. One of them is “percent_change_next_weeks_price”. This is our target. We need to forecast it for subsequent weeks given that we have current week data. The values of the ‘Date’ attribute indicate the presence of time-series information. Before jumping into the solution, we will discuss some concepts of Time Series at a high level for our understanding.

Definition of Timeseries

There are different techniques for modeling a time series. One of them is the Autoregressive Process (AR). There, a time series problem can be expressed as a recursive regression problem where dependent variables are values of the target variable itself at different time instances. Let’s say if Yt is our target variable there are a series of values Y1, Y2,… at different time instances, then,

Stock Price Change Forecasting with Time Series: SARIMAX (3)

for all time instance t. Parameter µ is the mean of the process. We may interpret theterm

Stock Price Change Forecasting with Time Series: SARIMAX (4)

as representing “memory” or “feedback” of the past into the present value of the process. Parameter ф determines the amount of feedback and ɛt is information present at time t that can be added as extra. Definitely here by “process”, we mean an infinite or finite sequence of values of a variable at different time instances. If we expand the above recurrence relation, then weget:

Stock Price Change Forecasting with Time Series: SARIMAX (5)

It is called the AR(1) process. h is known as theLag.

A Lag is a logical/abstract time unit. It could be hour, day, week, year etc. It makes the definition moregeneric.

Instead of the only a single previous value, if we consider p previous values, then it becomes AR(p) process and the same can be expressed as:

Stock Price Change Forecasting with Time Series: SARIMAX (6)

So, there are many feedback factors like ф1, ф2,..фp for AR(p) process. It is a weighted average of all pastvalues.

There is another type of modeling known as MA(q) process or Moving Average process which considers only new information ɛ and can be expressed similarly as a weightedaverage:

Stock Price Change Forecasting with Time Series: SARIMAX (7)

Stationarity & Differencing

From all of the equations above, we can see that if ф or θ< 1 then the value of Yt converges to µ i.e., a fixed value. It means that if we take the average Y value from any two-interval, then it will always be close to µ, i.e., closeness will be statistically significant. This type of series is known as Stationary time series. On the other hand, ф > 1 gives explosive behavior and the series becomes Non-stationary.

The basic assumption of time series modeling is stationary in nature. That’s why we have to bring down a non-stationary series to a stationary state by differencing. It is definedas:

Stock Price Change Forecasting with Time Series: SARIMAX (8)

Then, we can model ∆Yt again as time series. It helps to remove explosiveness as stated above. This differencing can be done several times as it is not guaranteed that just doing it one time will make the series stationary.

ARIMA(p,d,q) process

ARIMA is the join process modeling with AR(p), MA(q), and d times differencing. So, here Yt contains all the terms of AR(p) and MA(q). It says that, if an ARIMA(p,d,q) process is differentiated d times then it becomes stationary.

Seasonality &SARIMA

A time series can be affected by seasonal factors like a week, few months, quarters in a year, or a few years in a decade. Within those fixed time spans, different behaviors are observed in the target variable which differs from the rest. It needs to be modeled separately. In fact, seasonal components can be extracted out from the original series and modeled differently as said. It is definedas:

Stock Price Change Forecasting with Time Series: SARIMAX (9)

where m is the length of the season, i.e. degree of seasonality.

SARIMA is the process modeling where seasonality is mixed with ARIMAmodel.

SARIMA is defined by (p,d,q)(P, D, Q) where P, D, Q is the order of the seasonal components.

SARIMAX &ARIMAX

So far, we have discussed modeling the series with target variable Y only. We haven’t considered other attributes present in thedataset.

ARIMAX considers adding other feature variables also in the regression model.

Here X stands for exogenous. It is like a vanilla regression model where recursive target variables are there along with other features. With reference to our problem statement, we can design an ARIMAX model with target variable percent_change_next_weeks_price at different lags along with other features like volume, low, close, etc. But, other features are considered fixed over time and don’t have lag dependent values, unlike the target variable. The seasonal version of ARIMAX is known asSARIMAX.

Data Analysis

We will start by analyzing the data. We will also learn some other concepts of time series along with theway.

Let’s first plot AutoCorelation Function(ACF) and PartialAutoCorelation Function (PACF) using statsmodel library:

import statsmodels.graphics.tsaplots as tsa_plots
tsa_plots.plot_pacf(df['percent_change_next_weeks_price'])
Stock Price Change Forecasting with Time Series: SARIMAX (10)

And thenACF:

tsa_plots.plot_acf(df['percent_change_next_weeks_price'])
Stock Price Change Forecasting with Time Series: SARIMAX (11)

ACF gives us the correlation between Y values at different lags. Mathematically covariance for this can be definedas:

Stock Price Change Forecasting with Time Series: SARIMAX (12)

A cut-off in the ACF plot indicates that there is no sufficient relation between lagged values of Y. It is also an indicator of order q of the MA(q) process. From the ACF plot, we can see that ACF cuts off at zero only. So, q should bezero.

PACF is the partial correlation between Y values, i.e., the correlation between Yt and Yt+k conditional on Yt+1,..,Yt+k-1. Like ACF, a cut-off in PACF indicates the order p of the AR(p) process. In our use case, we can see that p iszero.

Decomposing components

We will now, how many components are there in the timeseries.

Stock Price Change Forecasting with Time Series: SARIMAX (13)

The first graph shows the actual plot, the second one shows the trend. We can see that there is no specific trend (upward/downward) of the percentage_change_next_weeks_price variable. But seasonal plot reveals the existence of seasonal components as it shows waves of ups &downs.

Stationarity check — ADFtest

The characteristic equation of the AR(p) process is givenby:

Stock Price Change Forecasting with Time Series: SARIMAX (14)

From our previous discussion, we can say that an AR(1) process is stationary if ф < 1 and for AR(p), it should be ф1 + ф2+..+фp <1. So the if the solution of the characteristic equation is of theform:

Stock Price Change Forecasting with Time Series: SARIMAX (15)

i.e, if it has unit-roots, then the time series is not stationary.

We can formally test this with Augmented Dicky-Fuller test likebelow:

Stock Price Change Forecasting with Time Series: SARIMAX (16)

As the p-value is less than 0.05, so the series is stationary.

Building themodel

We will start building themodel.

Pre-processing

We will do some pre-processing like converting categorical variable stock to numerical, removing the ‘$’ prefix from price attributes, and fill all null values withzero.

We will also separate out the target & feature variables.

We will split the dataset into training &test.

TimeSeriesSplit incrementally splits the data in a cross-validation manner. We have to use the last X_train, X_testset.

Auto-modeling

We will use auto_arima from pmdarimalibrary.

It tries out with different SARIMAX(p,d,q)(P,D,Q) models and chooses the best one. We used X_train as exogenous variables and seasonal start order m as 2(i.e., start from m for trying out different seasonalorders).

We got the output asbelow:

Stock Price Change Forecasting with Time Series: SARIMAX (17)

So, what we analyzed in the Data Analysis section came outtrue.

auto_arima checks stationarity, seasonality, trend everything.

The best model has p=0, q=0 and as the model is stationary, d=0. But, as we saw it has some seasonal components, its order is(2,0,1).

We will build the model with statsmodels and trainingdataset

Model details (clipped):

Stock Price Change Forecasting with Time Series: SARIMAX (18)

It shows all feature variableweights.

Forecasting

Before testing the model we need to discuss the difference between prediction & forecast. In a normal regression/classification problem, we use the term prediction very often. But, time series is a little different. Forecasting always considers lags into account. Here, to predict the value of Yt, we need value of Yt-1. And of course, Yt-1 will also be a forecasted value. So, it is a sequential & recursive process rather thanrandom.

Mathematically, for an AR(2)process,

Stock Price Change Forecasting with Time Series: SARIMAX (19)

^Yn and ^Yn-1 are the previous forecasted values. This way the chain continues. In the case of ARIMAX, feature values are not dependent on time, so when we do a forecast, we feed previous Y values along with the same feature Xvalues.

Now, its time to test themodel:

from sklearn.metrics import mean_squared_error
mean_squared_error(result, Y_test)
Stock Price Change Forecasting with Time Series: SARIMAX (20)

We will plot the actual vs predicted results.

Stock Price Change Forecasting with Time Series: SARIMAX (21)

We can also see the error distribution.

model.plot_diagnostics()
plt.tight_layout()
plt.show()
Stock Price Change Forecasting with Time Series: SARIMAX (22)

Errors are normally distributed with zero mean and constant variance which is a goodsign.

Jupyter notebook can be foundhere:

avisheknag17/public_ml_models

Recently I authored a book on ML (https://twitter.com/bpbonline/status/1256146448346988546)

Stock Price Change Forecasting with Time Series: SARIMAX (23)

Stock Price Change Forecasting with Time Series: SARIMAX was originally published in Towards AI — Multidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Stock Price Change Forecasting with Time Series: SARIMAX (2024)
Top Articles
Latest Posts
Article information

Author: Fr. Dewey Fisher

Last Updated:

Views: 6024

Rating: 4.1 / 5 (42 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Fr. Dewey Fisher

Birthday: 1993-03-26

Address: 917 Hyun Views, Rogahnmouth, KY 91013-8827

Phone: +5938540192553

Job: Administration Developer

Hobby: Embroidery, Horseback riding, Juggling, Urban exploration, Skiing, Cycling, Handball

Introduction: My name is Fr. Dewey Fisher, I am a powerful, open, faithful, combative, spotless, faithful, fair person who loves writing and wants to share my knowledge and understanding with you.