Question

我正在尝试利用IBM Watson serivce的语音文本服务，但我面临着将语音转换为文本的一些问题。

你能帮我解决下面的情况吗？

我已经设置了VOIP（Asterisk / freeswitch）服务器，其中一个SIP客户端和B SIP客户端已注册，而一个名为B的呼叫已建立，他们正在使用G711 ULAW编解码器接管电话。

我有websocket应用程序，它连接到IBM watson语音到文本和会话建立。我得到了watson服务器的回复＃34; State listening＆＃34;。

现在我正在尝试将原始rtp数据包数据从VOIP服务器发送到watson服务器，但我得到了＃34;会话超时＆＃34;来自沃森的错误。

我正在使用以下配置参数。

由于这是我正在使用的实时RTP音频通话 EN-US_NarrowbandModel ＆＃39;内容类型＆＃39;：音频/ L16;率= 16000

我不断通过watson服务器的websocket连接发送RTP数据包的原始数据。

请帮我解决这个设置错误。

Answer 1

Hello @ram你是什么意思“我正在尝试发送原始rtp数据包数据”？ Watson STT服务不直接支持RTP数据包，您需要将其转换为支持的音频格式。在将RTP数据包传送到websocket之前，您是否将它们转换为audio/l16;rate=16000？

这是支持的格式列表：

audio/basic (Use only with narrowband models.)
audio/flac
audio/l16 (Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
audio/mp3
audio/mpeg
audio/mulaw (Specify the sampling rate of the audio.)
audio/ogg (The service automatically detects the codec of the input audio.)
audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav (Provide audio with a maximum of nine channels.)
audio/webm (The service automatically detects the codec of the input audio.)
audio/webm;codecs=opus
audio/webm;codecs=vorbis

https://www.ibm.com/watson/developercloud/speech-to-text/api/v1/#recognize_audio_websockets

使用IBM Watson进行实时RTP / VOIP /音频呼叫的语音转换为文本

1 个答案: