如何使用python将新的csv文件数据添加到训练LSTM模型中以预测下一个未来价值

时间:2019-11-25 04:41:29

标签: python tensorflow keras lstm

在这里,我有一个带四个输入的数据csv文件。我想使用LSTM模型预测下一个值。 首先,我用数据训练LSTM模型。 这是我的代码:

data5 = pd.read_csv('data27.csv',"," )
data6 = pd.read_csv('data33.csv',"," )
data7 = pd.read_csv('data40.csv',",") # here I connect three csv file which is having same column 
data5 = pd.DataFrame(data5, columns= ['date','x1','x2','x3','x4'])
data6 = data5.copy()
data7 = data5.copy()
data8 = data5.append([data6, data7])

data8.set_index('date', inplace=True)

data8 = data8.values

sc = MinMaxScaler(feature_range=(0, 1))
train_data = sc.fit_transform(data8)

x_train = []
y_train = []
for i in range(60,len(train_data)):
   x_train.append(train_data[i-60:i,0])
   y_train.append(train_data[i,0])
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0],x_train.shape[1],1))


model = Sequential()
model.add(LSTM(units=10, return_sequences=True, input_shape=(x_train.shape[1],1)))
model.add(LSTM(units=10))
model.add(Dense(units=1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x_train, y_train, epochs=10, batch_size=32)

训练模型后,我尝试使用相同的输入值“ date,x1,x2,x3,x4”在新的csv文件x1列中获取预测值 然后我为此写了代码:

dataset_test = pd.read_csv('data56.csv')
dataset_total = pd.concat((data8['x1'], dataset_test['x1']),axis=0)
inputs =dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs)

然后我得到一个错误:

ValueError                                Traceback (most recent call last)
<ipython-input-62-0bcaba4a7ad4> in <module>()
----> 1 inputs = sc.transform(inputs)

~\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in transform(self, X)
    367         X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES)
    368 
--> 369         X *= self.scale_
    370         X += self.min_
    371         return X

ValueError: non-broadcastable output operand with shape (1153,1) doesn't match the broadcast shape (1153,4)

我的火车模型的csv文件:

My csv files for training

训练模型后,我的下一个csv文件进行测试:

new csv file for test

我在执行缩放器逆变换时遇到了另一个错误: 这是我的代码:

X_test = []
for i in range(3,inputs.shape[0]):
   X_test.append(inputs[i-3:i,0])
   X_test = np.array(X_test)

   X_test = np.reshape(X_test, (X_test.shape[0],X_test.shape[1],1))

  new output = model.predict(X_test)
  new output =  sc.inverse_transform( new output)

错误:

ValueError                                Traceback (most recent call last)
<ipython-input-45-489f3f23c5d3> in <module>()
----> 1 glucose = sc.inverse_transform(glucose)

~\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in inverse_transform(self, X)
    383         X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES)
    384 
--> 385         X -= self.min_
    386         X /= self.scale_
    387         return X

ValueError: non-broadcastable output operand with shape (43,1) doesn't match the broadcast shape (43,4)

有人可以帮助我解决此错误吗?

我更改了代码,然后收到此错误: 代码:

X_test = []
     for i in range(60,80):
       X_test.append(inputs[i-60:i,0])

 X_test = np.array(X_test)

 X_test = np.reshape(X_test, (X_test.shape[0],X_test.shape[1],1))

 new_output = model.predict(X_test)
 new_output =  sc.inverse_transform( new_output)

错误:

ValueError                                Traceback (most recent call last)
<ipython-input-174-8e8d9c47ce3d> in <module>()
     17 
     18 new_output = model.predict(X_test)
---> 19 new_output =  sc.inverse_transform( new_output)

~\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in inverse_transform(self, X)
    383         X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES)
    384 
--> 385         X -= self.min_
    386         X /= self.scale_
    387         return X

ValueError: non-broadcastable output operand with shape (20,1) doesn't match the broadcast shape (20,8)

1 个答案:

答案 0 :(得分:1)

您为什么要调整输入内容的形状以使摘要的最终尺寸为1?

dataset_test = pd.read_csv('data56.csv')
dataset_total = pd.concat((data8['x1'], dataset_test['x1']),axis=0)
inputs =dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs)

您的缩放器希望该数据的最后一个维的形状为4。

因此,当您致电data8['x1']时,您只会占用一列。创建并训练了具有4个输入的模型后,您将无法更改此设置。我怀疑您应该从此代码中删除['x1']部分,还是要修复'data56.csv',使其具有五个列(日期,x1,x2,x3,x4)。

修改

所以,我将您的代码更改为

dataset_test = pd.read_csv('data56.csv')
dataset_total = pd.concat((data8[['x1','x2','x3','x4']], 
                           dataset_test[['x1','x2','x3','x4']]),axis=0)
inputs =dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,4)
inputs = sc.transform(inputs)