为什么要为结构化数据而烦恼递归神经网络?

时间:2018-08-25 19:50:51

标签: python keras prediction recurrent-neural-network

我一直在Keras开发[instances, time, features]形状的结构化数据的前馈神经网络(FNN)和递归神经网络(RNN),并且FNN和RNN的性能相同(除了RNN需要更多的计算时间。

我还模拟了表格数据(下面的代码),我期望RNN优于FNN,因为该系列中的下一个值取决于该系列中的前一个值;但是,两种架构都可以正确预测。

对于NLP数据,我看到RNN优于FNN,但对于表格数据却没有。通常,什么时候可以期望RNN在表格数据方面胜过FNN?具体来说,有人可以发布带有表格数据的模拟代码,以证明RNN优于FNN吗?

谢谢!如果我的模拟代码不适合我的问题,请改编它或分享一个更理想的代码!

from keras import models
from keras import layers

from keras.layers import Dense, LSTM

import numpy as np
import matplotlib.pyplot as plt

在10个时间步中模拟了两个特征,其中第二个特征的值取决于上一个时间步中两个特征的值。

## Simulate data.

np.random.seed(20180825)

X = np.random.randint(50, 70, size = (11000, 1)) / 100

X = np.concatenate((X, X), axis = 1)

for i in range(10):

    X_next = np.random.randint(50, 70, size = (11000, 1)) / 100

    X = np.concatenate((X, X_next, (0.50 * X[:, -1].reshape(len(X), 1)) 
        + (0.50 * X[:, -2].reshape(len(X), 1))), axis = 1)

print(X.shape)

## Training and validation data.

split = 10000

Y_train = X[:split, -1:].reshape(split, 1)
Y_valid = X[split:, -1:].reshape(len(X) - split, 1)
X_train = X[:split, :-2]
X_valid = X[split:, :-2]

print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)

FNN:

## FNN model.

# Define model.

network_fnn = models.Sequential()
network_fnn.add(layers.Dense(64, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))

# Compile model.

network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_fnn = network_fnn.fit(X_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
    validation_data = (X_valid, Y_valid))

plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

LSTM:

## LSTM model.

X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1] // 2, 2)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1] // 2, 2)

# Define model.

network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(64, activation = 'relu', input_shape = (X_lstm_train.shape[1], 2)))
network_lstm.add(layers.Dense(1, activation = None))

# Compile model.

network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
    validation_data = (X_lstm_valid, Y_valid))

plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

1 个答案:

答案 0 :(得分:7)

在实践中,即使在NLP中,您也会看到RNN和CNN通常具有竞争力。 Here's于2017年发表的一篇评论文章,详细介绍了这一点。从理论上讲,RNN可以更好地处理语言的全部复杂性和顺序性,但实际上,更大的障碍通常是正确地训练网络,而RNN则很挑剔。

另一个可能会起作用的问题是查看类似平衡括号问题的问题(仅在字符串中加上括号,或者将括号与其他干扰字符一起使用)。这需要顺序处理输入并跟踪某些状态,并且使用LSTM和FFN可能更容易学习。

更新: 看起来顺序的某些数据实际上可能不一定要顺序处理。例如,即使您提供一个要加的数字序列,因为加法是可交换的,FFN也会和RNN一样好。这对于许多健康问题也可能是正确的,在这些健康问题中,主导信息不是连续性的。假设每年测量一次患者的吸烟习惯。从行为的角度来看,轨迹很重要,但是如果您要预测患者是否会患上肺癌,则该预测将主要取决于患者吸烟的年限(对于FFN,可能仅限于最近10年)。

因此,您想使玩具问题更复杂,并需要考虑数据的排序。也许是某种模拟的时间序列,您想在其中预测数据中是否存在峰值,但是您并不关心绝对值,而只是在乎峰值的相对性质。

Update2

我修改了您的代码以显示RNN性能更好的情况。诀窍是使用更复杂的条件逻辑,在LSTM中比FFN更自然地建模。代码如下。对于8列,我们看到FFN在1分钟内训练并达到6.3的验证损失。 LSTM的训练时间要长3倍,但最终的验证损失要低6​​倍,为1.06。

随着列数的增加,LSTM具有越来越大的优势,尤其是当我们添加更复杂的条件时。对于16列,FFN的验证损失为19(您可以更清楚地将训练曲线视为模型无法立即拟合数据)。相比之下,LSTM的训练时间要长11倍,但验证损失为0.31,比FFN小30倍!您可以使用更大的矩阵进行分析,以查看这种趋势将延伸多远。

from keras import models
from keras import layers

from keras.layers import Dense, LSTM

import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import time

matplotlib.use('Agg')

np.random.seed(20180908)

rows = 20500
cols = 10

# Randomly generate Z
Z = 100*np.random.uniform(0.05, 1.0, size = (rows, cols))

larger = np.max(Z[:, :cols/2], axis=1).reshape((rows, 1))
larger2 = np.max(Z[:, cols/2:], axis=1).reshape((rows, 1))
smaller = np.min((larger, larger2), axis=0)
# Z is now the max of the first half of the array.
Z = np.append(Z, larger, axis=1)
# Z is now the min of the max of each half of the array.
# Z = np.append(Z, smaller, axis=1)

# Combine and shuffle.

#Z = np.concatenate((Z_sum, Z_avg), axis = 0)

np.random.shuffle(Z)

## Training and validation data.

split = 10000

X_train = Z[:split, :-1]
X_valid = Z[split:, :-1]
Y_train = Z[:split, -1:].reshape(split, 1)
Y_valid = Z[split:, -1:].reshape(rows - split, 1)

print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)

print("Now setting up the FNN")

## FNN model.

tick = time.time()

# Define model.

network_fnn = models.Sequential()
network_fnn.add(layers.Dense(32, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))

# Compile model.

network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_fnn = network_fnn.fit(X_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
    validation_data = (X_valid, Y_valid))

tock = time.time()

print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')

print("Now evaluating the FNN")

loss_fnn = history_fnn.history['loss']
val_loss_fnn = history_fnn.history['val_loss']
epochs_fnn = range(1, len(loss_fnn) + 1)
print("train loss: ", loss_fnn[-1])
print("validation loss: ", val_loss_fnn[-1])

plt.plot(epochs_fnn, loss_fnn, 'black', label = 'Training Loss')
plt.plot(epochs_fnn, val_loss_fnn, 'red', label = 'Validation Loss')
plt.title('FNN: Training and Validation Loss')
plt.legend()
plt.show()

plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training points')
plt.show()

plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('valid points')
plt.show()

print("LSTM")

## LSTM model.

X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1], 1)

tick = time.time()

# Define model.

network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(32, activation = 'relu', input_shape = (X_lstm_train.shape[1], 1)))
network_lstm.add(layers.Dense(1, activation = None))

# Compile model.

network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
    validation_data = (X_lstm_valid, Y_valid))

tock = time.time()

print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')

print("now eval")

loss_lstm = history_lstm.history['loss']
val_loss_lstm = history_lstm.history['val_loss']
epochs_lstm = range(1, len(loss_lstm) + 1)
print("train loss: ", loss_lstm[-1])
print("validation loss: ", val_loss_lstm[-1])

plt.plot(epochs_lstm, loss_lstm, 'black', label = 'Training Loss')
plt.plot(epochs_lstm, val_loss_lstm, 'red', label = 'Validation Loss')
plt.title('LSTM: Training and Validation Loss')
plt.legend()
plt.show()

plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training')
plt.show()

plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title("validation")
plt.show()