我正在研究“触发词检测”模型,因此决定将该模型部署到手机上。
模型的输入形状为(None, 5511, 101)
。
输出形状为(None, 1375, 1)
。
但是在实际部署的App中,该模型无法一次获得5511的时间步,而是由手机的传感器产生的音频帧是一对一的。
我如何将这些数据片段逐个馈入模型并在每个时间步获取输出?
该模型是循环模型。但是“ model.predict()”的第一个参数是(None,5511,101),我打算做的是
output = []
for i in range(5511):
a = model.func(i, (None,1,101))
output.append(a)
模型的结构:
答案 0 :(得分:2)
可以通过使时间步长轴动态化来解决此问题。换句话说,定义模型时,时间步数应设置为None
。这是一个示例,说明该示例如何对模型的简化版本起作用:
from keras.layers import GRU, Input, Conv1D
from keras.models import Model
import numpy as np
x = Input(shape=(None, 101))
h = Conv1D(196, 15, strides=4)(x)
h = GRU(1, return_sequences=True)(h)
model = Model(x, h)
# The model works for the original number of timesteps (5511)
batch_size = 2
out = model.predict(np.random.rand(batch_size, 5511, 101))
print(out.shape)
# ... but also for fewer timesteps (say 32)
out = model.predict(np.random.rand(batch_size, 32, 101))
print(out.shape)
# However, it will not work if timesteps < Conv1D filter_size (15)!
out = model.predict(np.random.rand(batch_size, 14, 101))
print(out.shape)
但是请注意,除非您将输入序列填充到15,否则您将无法输入少于15个时间步长(Conv1D过滤器的尺寸)。
答案 1 :(得分:1)
您应该以一种经常性的方式更改模型,一次可以一次馈送数据,也可以考虑更改模型并及时使用适用于(重叠)窗口的方法来应用模型。每隔几条数据并获得部分输出。
仍然取决于模型,您可能只会在最后获得所需的输出。您应该进行相应的设计。
这里是一个示例:https://hacks.mozilla.org/2018/09/speech-recognition-deepspeech/
答案 2 :(得分:1)
要逐步传递输入,您需要使用stateful=True
的循环图层。
卷积层肯定会阻止您实现所需的功能。要么将其删除,要么以15个步骤为一组传递输入(其中15是卷积的内核大小)。
您需要将这15个步骤与步幅4进行协调,并且可能需要填充。如果我建议,为避免数学上的困难,可以使用kernel_size=16
,stride=4
和input_steps = 5512
,这是4
的倍数,这是您的跨步值。 (这将避免填充并允许更轻松的计算),并且您的输出步骤将是1375个完美的舍入。
然后您的模型将是:
inputs = Input(batch_shape=(batch_size,None, 101)) #where you will always use input shapes of (batch_size, 16, 101)
out = Conv1D(196, 16, strides=4)(inputs)
...
...
out = GRU(..., stateful=True)(out)
...
out = GRU(..., stateful=True)(out)
...
...
model = Model(inputs, out)
使用stateful=True
模型必须具有固定的批次大小。可以为1,但是为了优化处理速度,如果要并行处理多个序列(并且彼此独立),请使用更大的批处理大小。
要逐步进行操作,首先需要重置状态(每当使用stateful=True
模型时,每次输入新序列或新一批并行序列)。
所以:
#will start a new batch containing a number of sequences equal to batch_size:
model.reset_states()
#received 16 steps from batch_size sequences:
steps = an_array_shaped((batch_size, 16, 101))
#for training
model.train_on_batch(steps, something_for_y_shaped((batch_size, 1, 1)), ...)
#I don't recommend to train like this because of the batch normalizations
#If you can train the entire length at once, do it.
#never forget: for full length training, you would need model.reset_states() every batch.
#for predicting:
predictions = model.predict_on_batch(steps, ...)
#received 4 new steps from X sequences:
steps = np.concatenate([steps[:,4:], new_steps], axis=1)
#these new steps belong to the "same" batch_size sequences! Don't call reset states!
#repeat one of the above for training or predicting
new_predictions = model.predict_on_batch(steps, ...)
predictions = np.concatenate([predictions, new_predictions], axis=1)
#keep repeating this loop until you reach the last step
Finally, when you reached the last step, for safety, call `model.reset_states()` again, everything that you input will be "new" sequences, not new "steps" or the previous sequences.
------------
# Training hint
If you are able to train with the full sequences (not step by step), use a `stateful=False` model, train normally with `model.fit(...)`, later you recreate the model exactly, but using `stateful=True`, copy the weights with `new_model.set_weights(old_model.get_weights())`, and use the new model for predicting like above.