Keras回归的模型建议

时间:2018-03-02 15:35:54

标签: python machine-learning neural-network keras regression

我正在尝试用Keras解决回归,但MSE很大,我的意思是29346217.6819

我真的很新,所以你有什么建议可以让模特合理吗?我不确定我的数据是否正常或有问题,但这些都是实际的销售数据。

数据(大约3000行。我使用2000进行培训,使用1000进行测试) 完整数据为here

ProductNo,Day,Month,CartonSales
1,6,02,2374
1,3,02,2374
1,6,04,2374
1,6,04,2374
1,3,06,2374
1,6,09,2374
1,1,09,2374
1,6,09,2374
1,6,10,2374

代码

from keras import optimizers
from keras.callbacks import Callback
from numpy import array
from keras.models import Sequential
from keras.layers import Dense, Dropout
from matplotlib import pyplot
import pandas as pds
# prepare sequence


class TestCallback(Callback):
    def __init__(self, test_data):
        self.test_data = test_data

    def on_epoch_end(self, epoch, logs={}):
        x, y = self.test_data
        loss, acc = self.model.evaluate(x, y, verbose=0)
        print('\nTesting loss: {}, acc: {}\n'.format(loss, acc))

dataframe = pds.read_csv('pmidata.csv', usecols=[0, 1, 2, 3])
dataframe = dataframe.sample(frac=1)

dataframeX_train = dataframe.iloc[0:2000][['ProductNo', 'Day', 'Month']]
dataframeY_train = dataframe.iloc[0:2000][['CartonSales']]

dataframeX_test = dataframe.iloc[2001:3001][['ProductNo', 'Day', 'Month']]
dataframeY_test = dataframe.iloc[2001:3001][['CartonSales']]

# create model
model = Sequential()
model.add(Dense(3, input_dim=3, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam', metrics=['mse'])
#sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
#model.compile(loss='mse', optimizer=sgd, metrics=['mse'])
# train model
#history = model.fit(dataframe, dataframe, epochs=500, batch_size=len(X), verbose=2)
history = model.fit(dataframeX_train, dataframeY_train, epochs=100, batch_size=4, verbose=2, callbacks=[TestCallback((dataframeX_test, dataframeY_test))])
# plot metrics
pyplot.plot(history.history['mean_squared_error'])
pyplot.show()

1 个答案:

答案 0 :(得分:1)

据我所知,您的y值是CartonSales。销售可能具有较大的价值和较大的范围,这可能是您遇到如此高的错误的原因。您可以使用mean_squared_logarithmic_error而不是均方误差,但我建议您执行以下操作。

继续使用均方误差。 log转换你的值,然后exp转换你的预测

import numpy as np
dataframeY_train = np.log(dataframeY_train)
dataframeY_test = np.log(dataframeY_test )
....
predictions=model.predict(dataframeX_test)[:,0]
predictions = np.exp(predictions)