Tensorflow and LSTM neural networks for stock market forecasting (2024)

The purpose of this tutorial is to show you how to forecast the stock market using Google Tensorflow and LSTM neural networks — the most widely-used machine learning technique for predicting stock market trends.

Jermaine Matthew

8 min read

Apr 20, 2023

In an effort to forecast the stock market and find the “holy grail” that could serve as a foundation for an automated trading bot, this article was inspired by numerous experiments with machine learning tools and Python. The goal was to find a way to earn passive income and make money without being active. You’ll find out whether or not those efforts were successful by the end of this article.

Countless people have tried to predict the stock market since May 1792 using various methods. Technical indicators and technical analysis strategies are popular among Wall Street traders. Some shrewder traders utilize fundamental and even behavioral analysis to improve their odds of success in the market. A cutting-edge stock market forecasting technique has emerged as artificial intelligence and machine learning have advanced.

LSTM (long short-term memory) isn’t discussed in this tutorial, but you can find a detailed description on Wikipedia. My focus will be on the practical application and how you can use it to predict prices for the next three days.

We’ll use the following tools and libraries for this experiment:

Python
Google Colaboratory (Colab)
Google Tensorflow and Keras
pandas
NumPy
scikit-learn
yahoo_fin
Matplotlib.pyplot

We’ll start by importing the necessary libraries.

import numpy as np
import time as tm
import datetime as dt
import tensorflow as tf# Data preparation
from yahoo_fin import stock_info as yf
from sklearn.preprocessing import MinMaxScaler
from collections import deque
# AI
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
# Graphics library
import matplotlib.pyplot as plt

‘yahoo_fin’ isn’t included in Google Colab, so we need to install it:

!pip install yahoo_fin

Setting up our neural network We’ll set up some settings. I’ll explain each briefly:

N_STEPS is the number of days in our window, which is how deep our neural network will go.
The ‘LOOKUP_STEPS’ array contains the days we will forecast. As an example, we have three days [one day — tomorrow, two days — the day after tomorrow, three days — two days after tomorrow]. For a four- or five-day prediction, add the corresponding numbers [1, 2, 3, 4, 5].
We’re going to examine the market ticker called ‘STOCK’ in our research. Google (GOOGL on NASDAQ) is an example.
‘date_now’ lets us see what the stock price is now, and ‘date_3_years_back’ lets us see what the stock price was 1104 days ago.

# SETTINGS# Window size or the sequence length, 7 (1 week)
N_STEPS = 7
# Lookup steps, 1 is the next day, 3 = after tomorrow
LOOKUP_STEPS = [1, 2, 3]
# Stock ticker, GOOGL
STOCK = 'GOOGL'
# Current date
date_now = tm.strftime('%Y-%m-%d')
date_3_years_back = (dt.date.today() - dt.timedelta(days=1104)).strftime('%Y-%m-%d')

We have to get the data from Yahoo Finance to work with it. For our example, we get 1104 bars with a 1-day interval.

# LOAD DATA 
# from yahoo_fin 
# for 1104 bars with interval = 1d (one day)
init_df = yf.get_data(
 STOCK, 
 start_date=date_3_years_back, 
 end_date=date_now, 
 interval='1d')

Let’s take a look at the data. The web service has retrieved the whole set of columns by default. A ‘close’ column shows the closing market price, while an ‘open’ column shows the opening market price. In addition to the ticker, there are other columns.

Tensorflow and LSTM neural networks for stock market forecasting (2)

It’s not necessary to include all these columns in the machine learning model. The ‘close’ column tells us what the market price was at a specific time.

# remove columns which our neural network will not use
init_df = init_df.drop(['open', 'high', 'low', 'adjclose', 'ticker', 'volume'], axis=1)
# create the column 'date' based on index column
init_df['date'] = init_df.index

Let’s look at the ‘init_df’ table.

Tensorflow and LSTM neural networks for stock market forecasting (3)

Our new columns are ‘close’ and ‘date.’ We’ll use them going forward.

To display the graph, let’s create a plot from the authentic data we got from the service.

# Let's preliminary see our data on the graphic
plt.style.use(style='ggplot')
plt.figure(figsize=(16,10))
plt.plot(init_df['close'][-200:])
plt.xlabel("days")
plt.ylabel("price")
plt.legend([f'Actual price for {STOCK}'])
plt.show()

As far as I’m concerned, the chart looks good. The chart shows GOOGL’s stock price fluctuation since last year. I based this chart on the ‘close’ column, which is the closing market price.

Tensorflow and LSTM neural networks for stock market forecasting (4)

The LSTM neural network performs better with scaled data than regular data, so we need to scale the ‘Close’ column data. Here’s what we need:

# Scale data for ML engine
scaler = MinMaxScaler()
init_df['close'] = scaler.fit_transform(np.expand_dims(init_df['close'].values, axis=1))

What’s the new table look like?

Tensorflow and LSTM neural networks for stock market forecasting (5)

That’s right, the price is now scaled between 0 and 1. It’s necessary to do this to improve our model’s efficiency. I’ll tell you why later.

Preparing Data for Processing Now that we’ve preloaded and scaled the data, we need to prepare it for the next step. I’ll explain.

Forecasting stock prices for the next three days is our main goal. The data for the model needs to be arranged based on the number of days we want to predict.

As the target column, we’ll shift the ‘close’ column and store the results in the ‘future’ column. That’s the shift.

def PrepareData(days):
 df = init_df.copy()
 df['future'] = df['close'].shift(-days)
 last_sequence = np.array(df[['close']].tail(days))
 df.dropna(inplace=True)
 sequence_data = []
 sequences = deque(maxlen=N_STEPS) for entry, target in zip(df[['close'] + ['date']].values, df['future'].values):
 sequences.append(entry)
 if len(sequences) == N_STEPS:
 sequence_data.append([np.array(sequences), target])
 last_sequence = list([s[:len(['close'])] for s in sequences]) + list(last_sequence)
 last_sequence = np.array(last_sequence).astype(np.float32)
 # construct the X's and Y's
 X, Y = [], []
 for seq, target in sequence_data:
 X.append(seq)
 Y.append(target)
 # convert to numpy arrays
 X = np.array(X)
 Y = np.array(Y)
 return df, last_sequence, X, Y

As part of this script, we also have to calculate the last sequence, which contains the engine’s final window. Then we’ll predict future prices based on that sequence.

LSTM’s X and Y arrays are the last components. In the X array, you’ll find the sequence at a specific step and in the Y array, you’ll find the target price at that step. If you want to fully understand the process, I recommend playing around with all these parts in Google Colab.

The machine learning model now needs to be set up and configured. LSTM neural networks and a sequential model are used in this approach. As well as an LSTM and Dropout layer, we have a Dense layer.

‘x_train’ is the data array to train the model with;

Our ‘y_train’ parameter provides the training response based on the target ‘close’ column;

LSTM layers 1 and 2 have 60 neurons each. There’s a hidden layer between the LSTM layers with 30% of neurons dropped. The last step of the model is a Dense layer with 20 neurons and a single neuron.

def GetTrainedModel(x_train, y_train):
 model = Sequential()
 model.add(LSTM(60, return_sequences=True, input_shape=(N_STEPS, len(['close']))))
 model.add(Dropout(0.3))
 model.add(LSTM(120, return_sequences=False))
 model.add(Dropout(0.3))
 model.add(Dense(20))
 model.add(Dense(1)) BATCH_SIZE = 8
 EPOCHS = 80
 model.compile(loss='mean_squared_error', optimizer='adam')
 model.fit(x_train, y_train,
 batch_size=BATCH_SIZE,
 epochs=EPOCHS,
 verbose=1)
 model.summary()
 return model

With a batch size of 8 for each epoch, we’ll train our model over 80 epochs (computations).

In this example, we’re using ‘mean_squared_error’ as the loss function parameter and the ‘adam’ optimizer, which is the most popular.

We’ll summarize the training results and return the model to the stock market prediction program.

Here’s an overview, but it’s not exhaustive. I didn’t mean to provide one. I recommend watching a relevant course on Pluralsight for at least 6 hours to fully understand it.

Let’s forecast GOOGL’s prices for the next 3 days now that we’ve got everything we need.

The script below initializes the array of predicted prices. On each step (for each day in our array of days [1, 2, 3]), we prepare the data and train the model. LOOKUP_STEPS was assigned the value [1, 2, 3] in our settings chapter. It’s three days: the first, the second, and the third.

# GET PREDICTIONS
predictions = []for step in LOOKUP_STEPS:
 df, last_sequence, x_train, y_train = PrepareData(step)
 x_train = x_train[:, :, :len(['close'])].astype(np.float32)
 model = GetTrainedModel(x_train, y_train)
 last_sequence = last_sequence[-N_STEPS:]
 last_sequence = np.expand_dims(last_sequence, axis=0)
 prediction = model.predict(last_sequence)
 predicted_price = scaler.inverse_transform(prediction)[0][0]
 predictions.append(round(float(predicted_price), 2))

It may take some time to run this loop — I estimated at least 10 minutes for all three days. It’s kind of like data mining. We get the “desired result” in the end.

Tensorflow and LSTM neural networks for stock market forecasting (6)

Our neural network has given us results, but what can we do with them? Other than trading, there are lots of possibilities. This isn’t enough info to recommend trading. An artificial intelligence tool made this prediction, nothing more, nothing less. There’s more to successful trading than just the closing price.

This simple example might inspire you to incorporate machine learning and recurrent neural networks into your trading strategies, since they’re already being used on Wall Street.

I’d guess it’s a FLAT market from the picture above. Under these market conditions, I wouldn’t take any action.

It’s 98% accurate. What’s the formula? The accuracy is how much the stock price differs from the expected result over time. Here’s the listing of our model training process.

Tensorflow and LSTM neural networks for stock market forecasting (7)

According to the listing, the “loss” is around 0.0024, or about 2%.

Results Chart Let’s run the model for the entire ‘x_train’ range and see how accurate it is on historical data and predicted bars. Here’s the code with comments. You can experiment with this data step-by-step using Google Colab. Check out the link at the end.

# Execute model for the whole history range
copy_df = init_df.copy()
y_predicted = model.predict(x_train)
y_predicted_transformed = np.squeeze(scaler.inverse_transform(y_predicted))
first_seq = scaler.inverse_transform(np.expand_dims(y_train[:6], axis=1))
last_seq = scaler.inverse_transform(np.expand_dims(y_train[-3:], axis=1))
y_predicted_transformed = np.append(first_seq, y_predicted_transformed)
y_predicted_transformed = np.append(y_predicted_transformed, last_seq)
copy_df[f'predicted_close'] = y_predicted_transformed# Add predicted results to the table
date_now = dt.date.today()
date_tomorrow = dt.date.today() + dt.timedelta(days=1)
date_after_tomorrow = dt.date.today() + dt.timedelta(days=2)
copy_df.loc[date_now] = [predictions[0], f'{date_now}', 0, 0]
copy_df.loc[date_tomorrow] = [predictions[1], f'{date_tomorrow}', 0, 0]
copy_df.loc[date_after_tomorrow] = [predictions[2], f'{date_after_tomorrow}', 0, 0]
# Result chart
plt.style.use(style='ggplot')
plt.figure(figsize=(16,10))
plt.plot(copy_df['close'][-150:].head(147))
plt.plot(copy_df['predicted_close'][-150:].head(147), linewidth=1, linestyle='dashed')
plt.plot(copy_df['close'][-150:].tail(4))
plt.xlabel('days')
plt.ylabel('price')
plt.legend([f'Actual price for {STOCK}', 
 f'Predicted price for {STOCK}',
 f'Predicted price for future 3 days'])
plt.show()

On the next chart, you can see predictions for the next 3 days (small purple line), predictions for the entire period (dashed blue line), and actual prices (solid orange line).

Tensorflow and LSTM neural networks for stock market forecasting (8)

With a ‘solid red line’, I’ve highlighted predictions for future days.

You won’t know whether or not my attempt to predict the market was successful until the end of this article. According to the charts above, I’ve been successful at predicting stock prices with 98% accuracy. However, that’s not enough for trading. You can’t depend on knowing the price for the next 3 days for regular trading.

You can’t use the price unless you know it. Success takes a little more. According to my experiments with automated trading algorithms, you have to consider ‘stop loss’, ‘take profit’, ‘trading amount’, etc., in order to trade effectively. Therefore, you need more than just a target price to trade successfully.

DOwnload the code from here