Question

我正在使用来自云平台的Google语音api来获取流音频的语音到文本。我已经使用针对GCP的POST的curl short audio file请求进行了REST api调用。

我看过Google Streaming Recognize的documentation，它说“流语音识别只能通过gRPC进行。”

我在 OpenSuse Leap 15.0 中安装了gRPC（也有protobuf）。这是目录的屏幕截图。

接下来，我尝试运行this link中的streaming_transcribe示例，发现示例程序使用本地文件作为输入，但将其模拟为微音输入（按顺序捕获64K块），然后发送数据到Google服务器。

要进行初始测试以检查grpc是否在我的系统上正确设置，我运行了make run_tests。我将Makefile更改为：

...
...Some text as original Makefile
...
.PHONY: all
all: streaming_transcribe
googleapis.ar: $(GOOGLEAPIS_CCS:.cc=.o) 
      ar r $@ $?
streaming_transcribe: streaming_transcribe.o parse_arguments.o googleapis.ar
      $(CXX) $^ $(LDFLAGS) -o $@
run_tests:
      ./streaming_transcribe -b 16000 resources/audio.raw
      ./streaming_transcribe --bitrate 16000 resources/audio2.raw
      ./streaming_transcribe resources/audio.flac
      ./streaming_transcribe resources/quit.raw
clean: rm -f *.o streaming_transcribe \
       googleapis.ar \
       $(GOOGLEAPIS_CCS:.cc=.o)

此无效效果很好（原始的Makefile也不行）。但是streaming_transcribe.o文件是在运行Makefile之后创建的。因此，我手动运行了文件并得到以下响应

关于如何运行测试和使用gstreamer代替模拟麦克风音频的功能的任何建议吗？

Answer 1

如何运行测试

按照cpp-docs-samples上的说明进行操作。 先决条件-安装grpc，protobuf和googleapis，并在上面的链接中将环境设置为saib。

gstreamer，而不是用于模拟麦克风音频的功能

对于该程序，我创建了以下管道

gst-launch-1.0 filesrc location=/path/to/file/FOO.wav ! wavparse ! audioconvert ! audio/x-raw,channels=1,depth=16,width=16,rate=44100 ! rtpL16pay  ! udpsink host=xxx.xxx.xxx.xxx port=yyyy

可以通过更改管道中的适当元素将音频文件更改为flac或mp3

gst-launch-1.0 udpsrc port=yyyy ! "application/x-rtp,media=(string)audio, clock-rate=(int)44100, width=16, height=16, encoding-name=(string)L16, encoding-params=(string)1, channels=(int)1, channel-positions=(int)1, payload=(int)96" ! rtpL16depay ! audioconvert ! audio/x-raw,format=S16LE ! filesink location=/path/to/where/you/want/to/dump/the/rtp/payloads/ABC.raw

从rtp流中获取有效负载并将其写入文件的过程是在另一个线程中完成的，而不是将数据发送到google并读取响应。

Answer 2

也许专用的声卡可以收听rtsp流？

try (SpeechClient speechClient = SpeechClient.create

RecognitionConfig config =
    RecognitionConfig.newBuilder()
        .setEncoding(AudioEncoding.LINEAR16)
        .setLanguageCode("en-US")
        .setSampleRateHertz(44100)
        .setAudioChannelCount(2)
        .setEnableSeparateRecognitionPerChannel(true)
        .build();

在C ++中将Gstreamer与Google Speech API（Streaming Transcribe）结合使用

2 个答案: