Question

我是Keras的新手......正在阅读文档......在Keras Functional API简介中引用最后一个示例（“视频问答应答模型”）：https://keras.io/getting-started/functional-api-guide/

简而言之，这个例子采用了一个关于视频的自然语言问题，并对可能的答案进行了softmax。基本上，开发了vision_model来编码一帧视频。它用TimeDistributed（）包装并应用于视频序列......传递给LSTM以获得整个视频序列的一个向量......并且它与编码问题连接在一起。

原始代码：

video_input = Input(shape=(100, 3, 224, 224))
# This is our video encoded via the previously trained vision_model (weights are reused)
encoded_frame_sequence = TimeDistributed(vision_model)(video_input)  # the output will be a sequence of vectors
encoded_video = LSTM(256)(encoded_frame_sequence)  # the output will be a vector

# This is a model-level representation of the question encoder, reusing the same weights as before:
question_encoder = Model(inputs=question_input, outputs=encoded_question)

# Let's use it to encode the question:
video_question_input = Input(shape=(100,), dtype='int32')
encoded_video_question = question_encoder(video_question_input)

# And this is our video question answering model:
merged = keras.layers.concatenate([encoded_video, encoded_video_question])
output = Dense(1000, activation='softmax')(merged)
video_qa_model = Model(inputs=[video_input, video_question_input], outputs=output)

如果我想在视频序列上丢弃LSTM并询问有关每帧的相同问题，并对所有100帧的潜在输出执行softmax会怎么样？我可以在概念上考虑两种方法，但我真的不知道如何实现，或者如果有一种更好的方法来处理看似常见的用例。

方法1

我怀疑这是要走的路......将编码问题与vision_model的输出连接起来，vision_model用于编码一个图像......这是在前面的例子中在同一个链接上完成的（“视觉问题回答” model“）并称为vqa_model。

vqa_model = Model(inputs=[image_input, question_input], outputs=output)

现在我可以将它传递给TimeDistributed包装器吗？

approach_1 = TimeDistributed(vqa_model)([video_input, question_input])

这会产生一个断言错误，因为（我认为）显而易见的原因。

AssertionError
----> 1 approach_1 = TimeDistributed(vqa_model)([video_input, question_input])

Path\Continuum\Anaconda3\lib\site-packages\keras\engine\topology.py in __call__(self, inputs, **kwargs)
    558                     self.build(input_shapes[0])
    559                 else:
--> 560                     self.build(input_shapes)
    561                 self.built = True
    562 

Path\Continuum\Anaconda3\lib\site-packages\keras\layers\wrappers.py in build(self, input_shape)
    139 
    140     def build(self, input_shape):
--> 141         assert len(input_shape) >= 3
    142         self.input_spec = InputSpec(shape=input_shape)
    143         child_input_shape = (input_shape[0],) + input_shape[2:]

AssertionError:

我需要传递两个输入，其中只有一个（video_input）实际上有时间维度（question_input没有）。有没有办法让这样的工作......或者我是否需要广播嵌入式问题100x以匹配视频输入的时间维度？这似乎非常低效。

方法2

这个想法似乎超级笨重......但如果我让vqa_model成为共享层，并在同一个问题上手动调用它100x，那么下一帧视频。

vqa_1 = vqa_model([frame_1, question_input])
vqa_2 = vqa_model([frame_2, question_input])
...
vqa_100 = vqa_model([frame_100, question_input])

或者......更可能的......我是不是错了？

提前致谢。

传递keras TimeDistributed包装多个输入，只有一个具有时间维度

0 个答案: