不是使用提供的数据集,而是分配作业所需的数据集。
使用的代码是
#import packages
import pandas as pd
import numpy as np
#to plot within notebook
import matplotlib.pyplot as plt
%matplotlib inline
#setting figure size
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 20,10
#for normalizing data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
#read the file
df = pd.read_csv('C:/Users/Usert/Downloads/stock-20050101-to-20171231/stock-20050101-to-20171231/IBM_2006-01-01_to_2018-01-01.csv')
#print the head
df.head()
#setting index as date
df['Date'] = pd.to_datetime(df.Date,format='%Y-%m-%d')
df.index = df['Date']
#plot
plt.figure(figsize=(16,8))
plt.plot(df['Close'], label='Close Price history')
#creating dataframe with date and the target variable
data = df.sort_index(ascending=True, axis=0)
new_data = pd.DataFrame(index=range(0,len(df)),columns=['Date', 'Close'])
for i in range(0,len(data)):
new_data['Date'][i] = data['Date'][i]
new_data['Close'][i] = data['Close'][i]
#splitting into train and validation
train = new_data[:987]
valid = new_data[987:]
new_data.shape, train.shape, valid.shape
((1235, 2), (987, 2), (248, 2))
train['Date'].min(), train['Date'].max(), valid['Date'].min(), valid['Date'].max()
#make predictions
preds = []
for i in range(0,248):
a = train['Close'][len(train)-248+i:].sum() + sum(preds)
b = a/248
preds.append(b)
#calculate rmse
rms=np.sqrt(np.mean(np.power((np.array(valid['Close'])-preds),2)))
rms
#plot
valid['Predictions'] = 0
valid['Predictions'] = preds
plt.plot(train['Close'])
plt.plot(valid[['Close', 'Predictions']])
运行正常,直到遇到错误时“ #Calculate RMSE”。
File "<ipython-input-92-1256d885493e>", line 65, in <module>
rms=np.sqrt(np.mean(np.power((np.array(valid['Close'])-preds),2)))
ValueError: operands could not be broadcast together with shapes (2033,) (248,)
按要求使用“ print(valid.shape)”和“ print(len(preds))”将返回“(604,3)”和“ 248”。
有人知道每次更改数字时如何更改数字以适合我的数据集?
仅供参考;
我使用的数据集有7列,分别是“日期,打开,高,低,关闭,成交量和名称”,其中包含标题的3021行数据。
尽管本教程中有8列,分别是“日期,开盘价,最高价,最低价,最后价,收盘价,total_trade_quantity和营业额”,其中包括标题的1236行。