如何标准化测试数据以用于Keras?

时间:2019-07-14 01:07:32

标签: python keras neural-network normalization

我已经开始涉足神经网络和时间序列预测,而我的搜索使我开始将Keras与Python结合使用。这很容易,我觉得我已经学到了很多东西,但是有关规范化的一些细节使我震惊。

我正在使用MachineLearningMastery.com上的教程,但它们并没有特别针对时间序列数据进行标准化。我试图合并来自该处和其他地方的教程中的信息,但是我对标准化标签数据(而不是要素数据)(在大多数Keras和sklearn API文档中分别称为Y和X)相对有些困惑。 / p>

我认为问题与我使用的数据格式有关,因为这是时间序列,因此我正在按照教程进行操作,并将其分为数字组和结果。假设我的序列是4,6,7,4,5,6,7,3,5,我可以将其分解为[4,6,7] 4,[6,7,4] 5,[7 ,4,5] 6等等,其中[]中的值是X数组,结果在Y数组中。规范化在X上运行良好,但是我无法将目标数组(Y)规范化。

# univariate lstm example
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Dropout  
from keras.layers import Bidirectional
from sklearn.preprocessing import MinMaxScaler

import pandas as pd  

# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
        # find the end of this pattern
        end_ix = i + n_steps
        # check if we are beyond the sequence
        if end_ix > len(sequence)-1:
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

# define input sequence
raw_seq = pd.read_csv(r'.\collection.csv') 
raw_processed = raw_seq.iloc[:, 1].values  #This gets all rows, and the second column

# choose a number of time steps
n_steps = 20
n_layers = 1
n_neurons = 100
s_actfunc = 'relu'

# split into samples
X, y = split_sequence(raw_processed, n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features = 1
X = X.reshape((X.shape[0], X.shape[1]))
y = y.reshape((y.shape[0]))
scaler = MinMaxScaler(feature_range = (0, 1))
scaler.fit(X)
X_scaled = scaler.transform(X)
y_shaped = y.reshape(-1, 1) #***This is where I'm losing things I think.
Y_scaled = scaler.transform(y_shaped)
input_dim = X.shape[1]

# define model
model = Sequential()
for i in range(n_layers):       
        model.add(Dense(n_neurons,  activation=s_actfunc,  input_dim=input_dim)) #middle layer
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

# fit model
model.fit(X_scaled, y_scaled, epochs=200, batch_size=1,  verbose=2)

# demonstrate prediction
x_input = raw_seq.iloc[-n_steps:, 1].values #2, 7, 1, 3
x_input = x_input.reshape((1, n_steps))
x_input_scaled = scaler.transform(x_input)
yhat = model.predict(x_input_scaled, verbose=1)
yhat = scaler.inverse_transform(yhat)
print(yhat)

当上面的代码在未将y标准化的情况下运行时,最明显的是yhat预测输出未标准化,这与所输入的内容正确。

谢谢!

0 个答案:

没有答案