Question

我正在研究“触发词检测”模型，因此决定将该模型部署到手机上。

模型的输入形状为(None, 5511, 101)。输出形状为(None, 1375, 1)。

但是在实际部署的App中，该模型无法一次获得5511的时间步，而是由手机的传感器产生的音频帧是一对一的。

我如何将这些数据片段逐个馈入模型并在每个时间步获取输出？

该模型是循环模型。但是“ model.predict（）”的第一个参数是（None，5511,101），我打算做的是

output = []
for i in range(5511): 
    a = model.func(i, (None,1,101))
    output.append(a)

模型的结构：

Answer 1

可以通过使时间步长轴动态化来解决此问题。换句话说，定义模型时，时间步数应设置为None。这是一个示例，说明该示例如何对模型的简化版本起作用：

from keras.layers import GRU, Input, Conv1D
from keras.models import Model
import numpy as np

x = Input(shape=(None, 101))
h = Conv1D(196, 15, strides=4)(x)
h = GRU(1, return_sequences=True)(h)
model = Model(x, h)


# The model works for the original number of timesteps (5511)
batch_size = 2
out = model.predict(np.random.rand(batch_size, 5511, 101))
print(out.shape)


# ... but also for fewer timesteps (say 32)
out = model.predict(np.random.rand(batch_size, 32, 101))
print(out.shape)


# However, it will not work if timesteps < Conv1D filter_size (15)!
out = model.predict(np.random.rand(batch_size, 14, 101))
print(out.shape)

但是请注意，除非您将输入序列填充到15，否则您将无法输入少于15个时间步长（Conv1D过滤器的尺寸）。

Answer 2

您应该以一种经常性的方式更改模型，一次可以一次馈送数据，也可以考虑更改模型并及时使用适用于（重叠）窗口的方法来应用模型。每隔几条数据并获得部分输出。

仍然取决于模型，您可能只会在最后获得所需的输出。您应该进行相应的设计。

这里是一个示例：https://hacks.mozilla.org/2018/09/speech-recognition-deepspeech/

Answer 3

要逐步传递输入，您需要使用stateful=True的循环图层。

卷积层肯定会阻止您实现所需的功能。要么将其删除，要么以15个步骤为一组传递输入（其中15是卷积的内核大小）。

您需要将这15个步骤与步幅4进行协调，并且可能需要填充。如果我建议，为避免数学上的困难，可以使用kernel_size=16，stride=4和input_steps = 5512，这是4的倍数，这是您的跨步值。（这将避免填充并允许更轻松的计算），并且您的输出步骤将是1375个完美的舍入。

然后您的模型将是：

inputs = Input(batch_shape=(batch_size,None, 101)) #where you will always use input shapes of (batch_size, 16, 101)
out = Conv1D(196, 16, strides=4)(inputs)
...
...
out = GRU(..., stateful=True)(out)
...
out = GRU(..., stateful=True)(out)
...
...

model = Model(inputs, out)

使用stateful=True模型必须具有固定的批次大小。可以为1，但是为了优化处理速度，如果要并行处理多个序列（并且彼此独立），请使用更大的批处理大小。

要逐步进行操作，首先需要重置状态（每当使用stateful=True模型时，每次输入新序列或新一批并行序列）。

所以：

#will start a new batch containing a number of sequences equal to batch_size:
model.reset_states()

#received 16 steps from batch_size sequences:
steps = an_array_shaped((batch_size, 16, 101))

#for training 
model.train_on_batch(steps, something_for_y_shaped((batch_size, 1, 1)), ...)
    #I don't recommend to train like this because of the batch normalizations    
    #If you can train the entire length at once, do it.    
    #never forget: for full length training, you would need model.reset_states() every batch. 

#for predicting:
predictions = model.predict_on_batch(steps, ...)

#received 4 new steps from X sequences:
steps = np.concatenate([steps[:,4:], new_steps], axis=1)

#these new steps belong to the "same" batch_size sequences! Don't call reset states!
#repeat one of the above for training or predicting
new_predictions = model.predict_on_batch(steps, ...)
predictions = np.concatenate([predictions, new_predictions], axis=1)

#keep repeating this loop until you reach the last step

Finally, when you reached the last step, for safety, call `model.reset_states()` again, everything that you input will be "new" sequences, not new "steps" or the previous sequences. 

------------

# Training hint

If you are able to train with the full sequences (not step by step), use a `stateful=False` model, train normally with `model.fit(...)`, later you recreate the model exactly, but using `stateful=True`, copy the weights with `new_model.set_weights(old_model.get_weights())`, and use the new model for predicting like above.

如何使用张量流部署触发词检测

3 个答案: