Question

通过Google服务器识别语音，我将SpeechRecognizer类与RecognitionListener结合使用，如Stephan的answer至this question中所述。另外，我尝试使用来自RecognitionListener的onBufferReceived（）回调捕获正在识别的音频信号，如：

byte[] sig = new byte[500000] ;
int sigPos = 0 ;
...
public void onBufferReceived(byte[] buffer) {
  System.arraycopy(buffer, 0, sig, sigPos, buffer.length) ;
  sigPos += buffer.length ;
}
...

这似乎工作正常，除非当SpeechRecognizer无法连接到Google服务器时，当一大块音频未复制到上述sig数组中时，会抛出HTTP连接超时异常。 SpeechRecognizer最终连接到Google服务器，识别结果表明收到了完整的音频信号;只有sig数组缺少某些音频块。

有人遇到同样的问题吗？任何提示解决方案？谢谢！

Answer 1

我倾向于说这可能是识别服务行为的不一致，甚至可能是您使用的Android版本中的错误。但是，文档指出，不能保证调用此方法以使其适合规范。到目前为止我注意到的是（在Android 2.3.4上）：我在录制时得到了字节，但是如果有例如SocketTimeout它会在一段时间后尝试将数据重新发送到服务器，但没有再次为同一数据调用onBufferReceived。用于测试的代码与您在帖子中链接的代码相同。

为什么您认为方法中收到的音频中缺少某些块？如果只缺少几个块，甚至可能是这样的，虽然那些块丢失了，但识别仍然有效。

Answer 2

在现代版本中，onBufferReceieved不起作用，您可以改为record/save audio from voice recognition intent。

Answer 3

实现这一目标的最佳方式是另一种方式。使用cv::VideoCapture * stream = new cv::VideoCapture("stream_ip"); if (!stream->isOpened()){ printf("Couldn't open stream! %s\n", strerror(errno)); } //We create window with OpenGL enabled. cv::namedWindow("rtsp_stream", cv::WINDOW_OPENGL); //Make it fullscreen (I also tried with fixed screen size without luck.) cv::setWindowProperty("rtsp_stream", cv::WND_PROP_FULLSCREEN, cv::WINDOW_FULLSCREEN); //Set OpenGL context to use this window. cv::setOpenGlContext("rtsp_stream"); //Set openGlDrawCallback. cv::setOpenGlDrawCallback("rtsp_stream", on_opengl, NULL); //This is the material that the image will be rendered on. cv::Mat frame; char k; bool continueStream = true; while (continueStream) { //We read data from the stream and write it on the frame. if((stream->read(frame)) != 0){ //Then we display/render the image using imshow. cv::imshow("rtsp_stream", frame); k = cv::waitKey(1); //I'm not sure if updateWindow needs to be manually called to make openGLDrawCallback or if imshow calls it automatically after done rendering. So I have tried with and without it. //cv::updateWindow("rtsp_stream"); switch(k){ case 0x1b: //ESC key printf("Closing stream.\n"); continueStream = false; break; } } }捕获音频数据，（我建议使用open_gl而非std::cout << cv::getBuildInformation() << std::endl;作为输入，以便获得非常干净的音频），然后将其传递给{ {1}}。：）

捕获发送到Google语音识别服务器的音频

3 个答案: