LSTM中的状态

Question

我使用Keras和TensorFlow作为后端创建了一个LSTM。在对num_step为96的小批量进行训练之前，将LSTM的隐藏状态设置为上一个时间步的真实值。

首先是参数和数据：

<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    package="ek97.fhict.theapp">
    <!-- To auto-complete the email text field in the login form with the user's emails -->
    <uses-permission android:name="android.permission.GET_ACCOUNTS" />
    <uses-permission android:name="android.permission.READ_PROFILE" />
    <uses-permission android:name="android.permission.READ_CONTACTS" />
    <!--
         The ACCESS_COARSE/FINE_LOCATION permissions are not required to use
         Google Maps Android API v2, but you must specify either coarse or fine
         location permissions for the 'MyLocation' functionality.
    -->
    <uses-permission android:name="android.permission.ACCESS_FINE_LOCATION" />
    <uses-permission android:name="android.permission. ACCESS_COARSE_LOCATION" />
    <uses-permission android:name="android.permission.INTERNET" />

    <application
        android:allowBackup="true"
        android:icon="@drawable/loqate_logo"
        android:label="@string/app_name"
        android:roundIcon="@drawable/loqate_logo"
        android:supportsRtl="true"
        android:theme="@style/AppTheme"
        tools:ignore="GoogleAppIndexingWarning">
        <activity
            android:name=".bottomnavigationfinal"
            android:label="@string/title_activity_bottomnavigationfinal"/>
        <activity android:name="Settings" />
        <activity
            android:name=".Login"
            android:label="@string/title_activity_login" />
        <activity
            android:name=".Registration"
            android:exported="true"
            android:label="@string/title_activity_registration" />
        <activity android:name=".Navigation" />
        <activity android:name=".SplashScreen">
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />

                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
        </activity>
        <!--
             The API key for Google Maps-based APIs is defined as a string resource.
             (See the file "res/values/google_maps_api.xml").
             Note that the API key is linked to the encryption key used to sign the APK.
             You need a different API key for each encryption key, including the release key that is used to
             sign the APK for publishing.
             You can define the keys for the debug and release targets in src/debug/ and src/release/.
        -->
        <activity
            android:name=".GoogleMaps"
            android:label="@string/title_activity_google_maps" >
            <meta-data
                android:name="com.google.android.geo.API_KEY"
                android:value="@string/google_maps_key" />
        </activity>
        <activity android:name=".Profile" />
    </application>

</manifest>

Keras模型由两层LSTM层和一层将输出修整为num_output，即2：

batch_size = 10
num_steps = 96
num_input = num_output = 2
hidden_size = 8
X_train = np.array(X_train).reshape(-1, num_steps, num_input)
Y_train = np.array(Y_train).reshape(-1, num_steps, num_output)
X_test = np.array(X_test).reshape(-1, num_steps, num_input)
Y_test = np.array(Y_test).reshape(-1, num_steps, num_output)

生成器以及训练（hidden_states [x]的形状为（2，））：

model = Sequential()
model.add(LSTM(hidden_size, batch_input_shape=((batch_size, num_steps, num_input)),
               return_sequences=True, stateful = True)))
model.add(LSTM(hidden_size, return_sequences=True)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(num_output, activation='softmax')))

model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])

这段代码不会给我一个错误，但是对此我有两个问题：

1）在生成器内部，我将LSTM def gen_data(): x = np.zeros((batch_size, num_steps, num_input)) y = np.zeros((batch_size, num_steps, num_output)) while True: for i in range(batch_size): model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx]) # hidden_states[x] has shape (2,) x[i, :, :] = X_train[gen_data.current_idx] y[i, :, :] = Y_train[gen_data.current_idx] gen_data.current_idx += 1 yield x, y gen_data.current_idx = 0 for epoch in range(100): model.fit_generator(generate_data(), len(X_train)//batch_size, 1, validation_data=None, max_queue_size=1, shuffle=False) gen_data.current_idx = 0的隐藏状态设置为model.layers[0].states[0]上形状为（2，）的变量。为什么对于隐藏大小大于2的LSTM可能如此？

2）hidden_states[gen_data.current_idx]中的值也可以是Keras模型的输出。两层LSTM以这种方式设置隐藏状态是否有意义？

Answer 1

LSTM中的状态

LSTM由计算cell state和hidden state的门组成。

在该图中，LSTM右侧的顶部箭头是单元状态（c_t），底部箭头是隐藏状态（h_t）。单元状态是门控操作的结果，状态大小与LSTM的hidden_size相同。每次展开（及其相应的输入X）都会导致其自身的单元状态。对于LSTM，单元状态由（batch_size x hidden_size）的值hidden_state（{h_t）和（batch_size x hidden_size）的cell_state（c_t）组成。

batch_size = 2
num_steps = 5
num_input = num_output = 1
hidden_size = 8

inputs = Input(batch_shape=(batch_size,num_steps, num_input))
lstm, state_h, state_c = LSTM(hidden_size, return_state=True, return_sequences=True)(inputs)
model = Model(inputs=inputs, outputs=[state_h, state_c])

print (model.predict(np.zeros((batch_size, num_steps, num_input))))
print (model.layers[1].cell.state_size)

注意：如果是GRU / RNN，则没有单元状态，只有隐藏状态，因此，单元状态只有h_t大小（batch_size，hidden_size） >

参考：

LSTM的Keras实现

Keras Docs:

状态张量的数量为1（对于RNN和GRU）或2（对于LSTM）。

Illustrated Guide to LSTM and GRU

进纸状态

在您的示例中，layers[0]引用1 LSTM，而layers[1]引用第二个LSTM。如果要初始化第n个批次的电池状态（c_t，从（n-1）的电池状态开始，即前一个批次，则有两个选择

您在生成器中的工作方式，但如果需要states[1]使用c_t和states[0]，请使用h_t。同样，对于第一个LSTM使用layers[0]，对于第二个LSTM使用layers[1]。但是请改用set_value方法。请参阅下面的编辑。
使用keras Stateful=True：将有状态设置为true时，每批LSTM状态不会重置。因此，如果您有一个包含5个数据样本（每个都有一些序列长度）的批处理，您将获得5个数据样本中每个样本的单元状态。将stateful设置为true时，这些状态用于初始化下一个批次的下一个批次单元格状态。

编辑：

应该使用方法set_value来设置张量变量的值。代码model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx])是有效的，因为它正在做的是将指向大小变量（batch_size X hidden_size）的state [0]更改为大小变量（batch_size x 2）。它不是在改变张量变量的值，而是使其指向不同维度的新张量变量。

测试代码：

 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
 model.layers[0].states[0]= K.variable(np.random.randn(10,2))
 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))

输出

<tf.Variable 'lstm_18/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f8812e6ee10
<tf.Variable 'Variable_2:0' shape=(10, 2) dtype=float32_ref> 0x7f881269afd0

如您所见，它们是两个不同的变量。正确的方法是

 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
 K.set_value(model.layers[0].states[0], np.random.randn(10,8))
 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))

输出

<tf.Variable 'lstm_20/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f881138eb70
<tf.Variable 'lstm_20/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f881138eb70

如果您的代码是固定的，则

K.set_value(model.layers[0].states[0], np.random.randn(10,2))

由于张量的大小和您设置为不匹配的值的大小，将引发错误。

在Keras中为每个具有不同隐藏大小和多个LSTM层的微型批处理设置隐藏状态

1 个答案:

LSTM中的状态

参考：

进纸状态

编辑：