Tensorflow LSTM有状态选项无法保持批次之间的状态

时间:2019-10-05 00:50:49

标签: python tensorflow keras lstm

我是Tensorflow的新手,想了解keras LSTM layer,所以我写了这篇 测试程序以识别stateful选项的行为。

#Tensorflow 1.x version
import tensorflow as tf
import numpy as np

NUM_UNITS=1
NUM_TIME_STEPS=5
NUM_FEATURES=1
BATCH_SIZE=4

STATEFUL=True
STATEFUL_BETWEEN_BATCHES=True

lstm = tf.keras.layers.LSTM(units=NUM_UNITS, stateful=STATEFUL,
            return_state=True, return_sequences=True,
            batch_input_shape=(BATCH_SIZE, NUM_TIME_STEPS, NUM_FEATURES),
            kernel_initializer='ones', bias_initializer='ones',
            recurrent_initializer='ones')
x = tf.keras.Input((NUM_TIME_STEPS,NUM_FEATURES),batch_size=BATCH_SIZE)
result = lstm(x)

I = tf.compat.v1.global_variables_initializer()
sess = tf.compat.v1.Session()
sess.run(I)

X_input = np.array([[[3.14*(0.01)] for t in range(NUM_TIME_STEPS)] for b in range(BATCH_SIZE)])
feed_dict={x: X_input}

def matprint(run, mat):
    print('Batch = ', run)
    for b in range(mat.shape[0]):
        print('Batch Sample:', b, ', per-timestep output')
        print(mat[b].squeeze())

print('BATCH_SIZE = ', BATCH_SIZE, ', T = ', NUM_TIME_STEPS, ', stateful =', STATEFUL)
if STATEFUL:
    print('STATEFUL_BETWEEN_BATCHES = ', STATEFUL_BETWEEN_BATCHES)

for r in range(2):
    feed_dict={x: X_input}
    OUTPUT_NEXTSTATES = sess.run({'result': result}, feed_dict=feed_dict)
    OUTPUT = OUTPUT_NEXTSTATES['result'][0]
    NEXT_STATES=OUTPUT_NEXTSTATES['result'][1:]
    matprint(r,OUTPUT)
    if STATEFUL:
        if STATEFUL_BETWEEN_BATCHES:
            #For TF version 1.x manually re-assigning states from
            #the last batch IS required for some reason ...
            #seems like a bug
            sess.run(lstm.states[0].assign(NEXT_STATES[0]))
            sess.run(lstm.states[1].assign(NEXT_STATES[1]))
        else:
            lstm.reset_states()

请注意,LSTM的权重设置为所有权重,并且输入保持恒定以保持一致性。

如预期的那样,statueful=False没有样本,时间或批间时脚本的输出 依赖:

BATCH_SIZE =  4 , T =  5 , stateful = False
Batch =  0
Batch Sample: 0 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch Sample: 1 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch Sample: 2 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch Sample: 3 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch =  1
Batch Sample: 0 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch Sample: 1 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch Sample: 2 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch Sample: 3 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]

在设置stateful=True时,我期望每批中的样品以产生不同的输出( 大概是因为TF图保持了批次样品之间的状态)。但是,情况并非如此:

BATCH_SIZE =  4 , T =  5 , stateful = True
STATEFUL_BETWEEN_BATCHES =  True
Batch =  0
Batch Sample: 0 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch Sample: 1 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch Sample: 2 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch Sample: 3 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch =  1
Batch Sample: 0 , per-timestep output
[0.86686385 0.8686781  0.8693927  0.8697042  0.869853  ]
Batch Sample: 1 , per-timestep output
[0.86686385 0.8686781  0.8693927  0.8697042  0.869853  ]
Batch Sample: 2 , per-timestep output
[0.86686385 0.8686781  0.8693927  0.8697042  0.869853  ]
Batch Sample: 3 , per-timestep output
[0.86686385 0.8686781  0.8693927  0.8697042  0.869853  ]

尤其要注意,同一批次的前两个样本的输出是相同的。

编辑OverlordGoldDragon已通知我 这种行为是预期的,而我的困惑在于 Batch 之间的区别- (samples, timesteps, features)-和一个批次(或该批次的一个“行”)中的 Sample 。 由下图表示:

因此,这提出了给定批次中各个样本之间的依赖性(如果有)的问题。来自 我的脚本输出,导致我相信每个样本被馈送到(逻辑上)独立的LSTM块-并且 差异样本的LSTM状态是独立的。我在这里画的:

我的理解正确吗?

顺便说一句,似乎stateful=True在TensorFlow 1.x中是损坏的,因为如果我删除了显式的 上一个批次的状态分配:

         sess.run(lstm.states[0].assign(NEXT_STATES[0]))
         sess.run(lstm.states[1].assign(NEXT_STATES[1]))

它将停止工作,即第二批的输出与第一批的输出相同。

我用Tensorflow 2.0语法重新编写了上面的脚本,其行为正是我所期望的 (无需在批次之间手动保留LSTM状态):

#Tensorflow 2.0 implementation
import tensorflow as tf
import numpy as np

NUM_UNITS=1
NUM_TIME_STEPS=5
NUM_FEATURES=1
BATCH_SIZE=4

STATEFUL=True
STATEFUL_BETWEEN_BATCHES=True

lstm = tf.keras.layers.LSTM(units=NUM_UNITS, stateful=STATEFUL,
            return_state=True, return_sequences=True,
            batch_input_shape=(BATCH_SIZE, NUM_TIME_STEPS, NUM_FEATURES),
            kernel_initializer='ones', bias_initializer='ones',
            recurrent_initializer='ones')
X_input = np.array([[[3.14*(0.01)]
                     for t in range(NUM_TIME_STEPS)]
                     for b in range(BATCH_SIZE)])
@tf.function
def forward(x):
  return lstm(x)

def matprint(run, mat):
    print('Batch = ', run)
    for b in range(mat.shape[0]):
        print('Batch Sample:', b, ', per-timestep output')
        print(mat[b].squeeze())

print('BATCH_SIZE = ', BATCH_SIZE, ', T = ', NUM_TIME_STEPS, ', stateful =', STATEFUL)
if STATEFUL:
    print('STATEFUL_BETWEEN_BATCHES = ', STATEFUL_BETWEEN_BATCHES)

for r in range(2):
    OUTPUT_NEXTSTATES = forward(X_input)
    OUTPUT = OUTPUT_NEXTSTATES[0].numpy()
    NEXT_STATES=OUTPUT_NEXTSTATES[1:]
    matprint(r,OUTPUT)
    if STATEFUL:
        if STATEFUL_BETWEEN_BATCHES:
            pass
            #Explicitly re-assigning states from the last batch isn't
            # required as the model maintains inter-batch history.
            #This is NOT the same behavior for TF.version < 2.0
            #lstm.states[0].assign(NEXT_STATES[0].numpy())
            #lstm.states[1].assign(NEXT_STATES[1].numpy())
        else:
            lstm.reset_states()

这是输出:

BATCH_SIZE =  4 , T =  5 , stateful = True
STATEFUL_BETWEEN_BATCHES =  True
Batch =  0
Batch Sample: 0 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch Sample: 1 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch Sample: 2 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch Sample: 3 , per-timestep output
[0.38041887 0.663519   0.79821336 0.84627265 0.8617684 ]
Batch =  1
Batch Sample: 0 , per-timestep output
[0.86686385 0.8686781  0.8693927  0.8697042  0.869853  ]
Batch Sample: 1 , per-timestep output
[0.86686385 0.8686781  0.8693927  0.8697042  0.869853  ]
Batch Sample: 2 , per-timestep output
[0.86686385 0.8686781  0.8693927  0.8697042  0.869853  ]
Batch Sample: 3 , per-timestep output
[0.86686385 0.8686781  0.8693927  0.8697042  0.869853  ]

1 个答案:

答案 0 :(得分:2)

一切似乎都按预期进行-但需要对代码进行大量修订:

  • public string ReadFromUserTillTrue(string promptMessage,string errorMessage,Func<string,bool> validator) { var input = string.Empty; while(true) { Console.WriteLine(promptMessage); input = Console.ReadLine(); if(validator(input)) break; else Console.WriteLine(errorMessage); } return input; } 应该是Batch: 0;您的Sample: 0包含4个样本,5个时间步长和1个功能 / 通道。在您的情况下,batch_shape=(4, 5, 1)是实际的批次标记
  • 每个样本都被视为一个独立序列,所以就像先喂样本1,然后喂样本2一样-除了学习期间,对批次样本的损失进行平均以计算梯度
  • 您的每个样本都是相同的 -因此,明智的做法是为每批样品获得相同的输出;运行I进行验证
  • 有状态工作按预期进行:给定相同输入,print(X_input)产生相同输出(因为没有保持内部状态)-而stateful=False即使输入相同(由于内存),也会产生不同stateful=True不同输出
  • 按原样,您的I不是学习的,所以权重是相同的-对于相同的输入,所有lstm的输出都将完全相同
  • 强烈建议不要将所有权重初始化为相同的值-而是使用random seed