Imports libraries for data preprocessing.
import numpy as npimport matplotlib.pyplot as pltimport pandas as pdimport datetimefrom sklearn import preprocessingfrom operator import itemgetterfrom sklearn.metrics import mean_squared_errorimport kerasimport seaborn as snssns.set()
Various libraries are imported for data manipulation, visualization, and machine learning in Python. Here is an overview of each library’s purpose:
NumPy (imported as np): It is a fundamental package for scientific computing, offering support for large multi-dimensional arrays and matrices, along with various mathematical functions.
Matplotlib.pyplot (imported as plt): Matplotlib is a plotting library used for creating diverse visualizations in Python. The pyplot module provides a MATLAB-like interface for plotting.
Pandas (imported as pd): Pandas is a powerful data manipulation library that enhances NumPy capabilities for fast and simple data analysis in Python.
Datetime: Python’s datetime module facilitates manipulation of dates and times.
sklearn.preprocessing: This module from scikit-learn offers utility functions and transformer classes to preprocess and normalize data for machine learning algorithms.
operator.itemgetter: Used to create a callable that retrieves items at specified indices from iterable objects like lists.
sklearn.metrics.mean_squared_error: Computes the mean squared error (MSE) to evaluate the performance of regression models.
Keras: A high-level neural networks API that simplifies the creation of deep learning models on TensorFlow, CNTK, or Theano.
Seaborn: A Python data visualization library built on Matplotlib, offering a high-level interface for creating visually appealing statistical graphics.
sns.set(): Sets the Seaborn plotting style to the default style.
These libraries enable efficient data analysis, visualization, and machine learning tasks in Python by providing specific functionalities to handle data, develop models, and present results effectively.
Read the data and convert it into a pandas dataframe.
Link to download source code, at the end of this article.
Reads, manipulates, and displays stockdata.
df = pd.read_csv("https://raw.githubusercontent.com/ashishpatel26/NYSE-STOCK_MARKET-ANALYSIS-USING-LSTM/master/nyse/prices-split-adjusted.csv", index_col = 0)df["adj close"] = df.close # Moving close to the last columndf.drop(['close'], 1, inplace=True) # Moving close to the last columndf.head()
This process involves reading a CSV file from a provided URL into a pandas DataFrame. After loading the data, it changes the column ‘close’ to ‘adj close’ (adjusted close) by renaming it. Then, it removes the original ‘close’ column, keeping the adjusted close as the last column in the DataFrame. Finally, a preview of the initial rows of the DataFrame is displayed.
Such a procedure is important for updating and organizing the DataFrame to enhance its usability for analysis or processing. It includes actions like renaming columns and improving the column order to facilitate data interpretation and manipulation.
Reads and displays data fromURL.
df2 = pd.read_csv("https://raw.githubusercontent.com/ashishpatel26/NYSE-STOCK_MARKET-ANALYSIS-USING-LSTM/master/nyse/fundamentals.csv")df2.head()
This script fetches a CSV file from a URL using the Pandas library in Python. The CSV file is then loaded into a Pandas DataFrame using pd.read_csv(). The URL points to a CSV file hosted on a GitHub repository.
Once the data is loaded into the DataFrame, the df2.head() method is utilized to display the initial rows of the DataFrame. This is useful for quickly examining the data’s structure, column names, and example values.
Employing this script is crucial for accessing and evaluating data stored in CSV files available online. It enables the data to be loaded into a structured format for conducting various data manipulation and analysis tasks in Python.
Retrieve all symbols from the given list.
Finds unique symbols in a DataFrame.
symbols = list(set(df.symbol))len(symbols)
501
This process retrieves the unique symbols found in the DataFrame df and saves them into a list named ‘symbols’.
Here’s the explanation:
By using set(df.symbol), a set of unique elements is created by isolating the ‘symbol’ column from the DataFrame df.
This set of unique symbols is then converted into a list using the list() function.
len(symbols) provides the count of unique symbols in the list.
This snippet is beneficial for removing duplicate symbols from a dataset, allowing for analysis or processing with only distinct values. It aids in eliminating repetition and guaranteeing that each unique symbol is only considered once in subsequent tasks.
Selects the first 11 elements.
symbols[:11]
['ES', 'NLSN', 'PNW', 'SYY', 'NTRS', 'MTB', 'HP', 'DPS', 'NFLX', 'MON', 'MUR']
This process extracts the elements from index 0 up to (but not including) index 11 of the “symbols” data structure. Extraction is commonly utilized when you wish to access a specific portion of a collection or sequence of items. It is important in situations where you need to work with only a subset of data rather than the entire dataset. This approach allows you to concentrate on a smaller portion of the data for analysis or manipulation, without the need to handle the entire collection simultaneously.