如何使用张量流部署触发词检测

时间:2020-04-01 14:26:19

标签: tensorflow keras lstm recurrent-neural-network

我正在研究“触发词检测”模型,因此决定将该模型部署到手机上。

模型的输入形状为(None, 5511, 101)。 输出形状为(None, 1375, 1)

但是在实际部署的App中,该模型无法一次获得5511的时间步,而是由手机的传感器产生的音频帧是一对一的。

我如何将这些数据片段逐个馈入模型并在每个时间步获取输出?

该模型是循环模型。但是“ model.predict()”的第一个参数是(None,5511,101),我打算做的是

output = []
for i in range(5511): 
    a = model.func(i, (None,1,101))
    output.append(a)

模型的结构:

structure of the model

enter image description here

3 个答案:

答案 0 :(得分:2)

可以通过使时间步长轴动态化来解决此问题。换句话说,定义模型时,时间步数应设置为None。这是一个示例,说明该示例如何对模型的简化版本起作用:

from keras.layers import GRU, Input, Conv1D
from keras.models import Model
import numpy as np

x = Input(shape=(None, 101))
h = Conv1D(196, 15, strides=4)(x)
h = GRU(1, return_sequences=True)(h)
model = Model(x, h)


# The model works for the original number of timesteps (5511)
batch_size = 2
out = model.predict(np.random.rand(batch_size, 5511, 101))
print(out.shape)


# ... but also for fewer timesteps (say 32)
out = model.predict(np.random.rand(batch_size, 32, 101))
print(out.shape)


# However, it will not work if timesteps < Conv1D filter_size (15)!
out = model.predict(np.random.rand(batch_size, 14, 101))
print(out.shape)

但是请注意,除非您将输入序列填充到15,否则您将无法输入少于15个时间步长(Conv1D过滤器的尺寸)。

答案 1 :(得分:1)

您应该以一种经常性的方式更改模型,一次可以一次馈送数据,也可以考虑更改模型并及时使用适用于(重叠)窗口的方法来应用模型。每隔几条数据并获得部分输出。

仍然取决于模型,您可能只会在最后获得所需的输出。您应该进行相应的设计。

这里是一个示例:https://hacks.mozilla.org/2018/09/speech-recognition-deepspeech/

答案 2 :(得分:1)

要逐步传递输入,您需要使用stateful=True的循环图层。

卷积层肯定会阻止您实现所需的功能。要么将其删除,要么以15个步骤为一组传递输入(其中15是卷积的内核大小)。

您需要将这15个步骤与步幅4进行协调,并且可能需要填充。如果我建议,为避免数学上的困难,可以使用kernel_size=16stride=4input_steps = 5512,这是4的倍数,这是您的跨步值。 (这将避免填充并允许更轻松的计算),并且您的输出步骤将是1375个完美的舍入。

然后您的模型将是:

inputs = Input(batch_shape=(batch_size,None, 101)) #where you will always use input shapes of (batch_size, 16, 101)
out = Conv1D(196, 16, strides=4)(inputs)
...
...
out = GRU(..., stateful=True)(out)
...
out = GRU(..., stateful=True)(out)
...
...

model = Model(inputs, out)

使用stateful=True模型必须具有固定的批次大小。可以为1,但是为了优化处理速度,如果要并行处理多个序列(并且彼此独立),请使用更大的批处理大小。

要逐步进行操作,首先需要重置状态(每当使用stateful=True模型时,每次输入新序列或新一批并行序列)。

所以:

#will start a new batch containing a number of sequences equal to batch_size:
model.reset_states()

#received 16 steps from batch_size sequences:
steps = an_array_shaped((batch_size, 16, 101))

#for training 
model.train_on_batch(steps, something_for_y_shaped((batch_size, 1, 1)), ...)
    #I don't recommend to train like this because of the batch normalizations    
    #If you can train the entire length at once, do it.    
    #never forget: for full length training, you would need model.reset_states() every batch. 

#for predicting:
predictions = model.predict_on_batch(steps, ...)

#received 4 new steps from X sequences:
steps = np.concatenate([steps[:,4:], new_steps], axis=1)

#these new steps belong to the "same" batch_size sequences! Don't call reset states!
#repeat one of the above for training or predicting
new_predictions = model.predict_on_batch(steps, ...)
predictions = np.concatenate([predictions, new_predictions], axis=1)

#keep repeating this loop until you reach the last step

Finally, when you reached the last step, for safety, call `model.reset_states()` again, everything that you input will be "new" sequences, not new "steps" or the previous sequences. 

------------

# Training hint

If you are able to train with the full sequences (not step by step), use a `stateful=False` model, train normally with `model.fit(...)`, later you recreate the model exactly, but using `stateful=True`, copy the weights with `new_model.set_weights(old_model.get_weights())`, and use the new model for predicting like above.