Question

请考虑以下两种模型：

from tensorflow.python.keras.layers import Input, GRU, Dense, TimeDistributed
from tensorflow.python.keras.models import Model

inputs = Input(batch_shape=(None, None, 100)) 
gru_out = GRU(32, return_sequences=True)(inputs)
dense = Dense(200, activation='softmax')
decoder_pred = TimeDistributed(dense)(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()

输出：

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, None, 100)         0         
_________________________________________________________________
gru (GRU)                    (None, None, 32)          12768     
_________________________________________________________________
time_distributed (TimeDistri (None, None, 200)         6600      
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________

第二种模式：

from tensorflow.python.keras.layers import Input, GRU, Dense
from tensorflow.python.keras.models import Model

inputs = Input(batch_shape=(None, None, 100)) 
gru_out = GRU(32, return_sequences=True)(inputs)
decoder_pred = Dense(200, activation='softmax')(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()

输出：

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, None, 100)         0         
_________________________________________________________________
gru_1 (GRU)                  (None, None, 32)          12768     
_________________________________________________________________
dense_1 (Dense)              (None, None, 200)         6600      
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________

我的问题是TimeDistributed层包装器对第一个模型有什么作用吗？这两个方面在任何方面是否都不同（考虑到它们的参数总数相同）？

使用TimeDistributed图层包装有什么作用？

0 个答案: