Question

我有一个使用Bluemix Speech to Text API的工作应用程序，为Http Live Streaming源提供隐藏式字幕。但是，解析ts文件中的声音会有一些延迟。我的代码如下：

videoProps.stream = WatsonSpeechToText.recognizeElement({
    element: myMediaElement,
    token: videoProps.ctx.token,
    muteSource: false,
    autoPlay: false,
    model:videoProps.ctx.currentModel,
    timestamps: true,
    profanity_filter: true,
    inactivity_timeout: -1,
    continuous: true
})
.pipe(new WatsonSpeechToText.FormatStream());

videoProps.stream.on("result", function(result) {
    //do something
}

是否有更快的API可以让我更接近实时？

由于

Answer 1

基于Kaldi的开源实现（例如CloudASR）可以比实时运行得快得多，您还可以调整系统以在速度和准确性之间取得平衡。但您必须维护服务器云。

Answer 2

Watson Speech-to-Text服务API提供具有不同性能特征的不同输入模型。根据音频的质量，BroadbandModel比实时稍快，但NarrowBand比实时稍慢。你使用的是哪种型号？如果您还没有使用BroadbandModel，请尝试使用BroadbandModel，因为这应该更适合字幕应用，假设音频也是实时流式传输。

您可以在http://www.ibm.com/watson/developercloud/doc/speech-to-text/input.shtml#models

的文档中查看有关这些模型及其特征的更多信息

使用HLS将bluemix实时语音转换为文本

2 个答案: