如何正确使用http音频流中的StreamingRecognizeRequest

时间:2018-12-01 13:38:26

标签: java google-api speech-recognition audio-streaming google-speech-api

为什么我在下面的responseObserver.onResponse()方法中没有得到任何响应(从StreamingRecognizeRequests)?

关于如何继续进行这一工作。

代码(简化为突出显示该问题,作为对Google示例代码的修改)

typedef struct Sentense{
    long int length;
    wchar_t *str;
} sentense_s;

typedef struct Text{
    long int quant;
    sentense_s *sent;
} text_s;

 <dependency>
  <groupId>com.google.cloud</groupId>
  <artifactId>google-cloud-speech</artifactId>
  <version>0.72.0-beta</version>
</dependency>

<dependency>
  <groupId>org.jflac</groupId>
  <artifactId>jflac-codec</artifactId>
  <version>1.5.2</version>
</dependency>

import com.google.api.gax.rpc.ClientStream;
import com.google.api.gax.rpc.ResponseObserver;
import com.google.api.gax.rpc.StreamController;
import com.google.cloud.speech.v1.*;
import com.google.protobuf.ByteString;
import org.jflac.sound.spi.FlacAudioFileReader;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import java.net.URL;
import java.util.ArrayList;

我已经确认这不是音频格式问题。当我将缓冲区写出ByteArrayOutputStream(baos)一段设定的时间(例如10秒)时,输入被识别并且我得到了预期的响应。

public static void streamingUrlRecognize(String url) {

ResponseObserver<StreamingRecognizeResponse> responseObserver = null;
try (SpeechClient client = SpeechClient.create()) {

  responseObserver =
      new ResponseObserver<StreamingRecognizeResponse>() {
        ArrayList<StreamingRecognizeResponse> responses = new ArrayList<>();

        public void onStart(StreamController controller) {}

        public void onResponse(StreamingRecognizeResponse response) {

          responses.add(response);
        }

        public void onComplete() {
          for (StreamingRecognizeResponse response : responses) {
            StreamingRecognitionResult result = response.getResultsList().get(0);
            SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
            System.out.printf("Transcript : %s\n", alternative.getTranscript());
          }
        }

        public void onError(Throwable t) {
          System.out.println(t);
        }
      };

  ClientStream<StreamingRecognizeRequest> clientStream =
      client.streamingRecognizeCallable().splitCall(responseObserver);

  RecognitionConfig recognitionConfig =
      RecognitionConfig.newBuilder()
          .setEncoding(RecognitionConfig.AudioEncoding.LINEAR16)
          .setLanguageCode("en-US")
          .setSampleRateHertz(16000)
          .build();

  StreamingRecognitionConfig streamingRecognitionConfig =
      StreamingRecognitionConfig.newBuilder().setConfig(recognitionConfig).build();

  StreamingRecognizeRequest request =
      StreamingRecognizeRequest.newBuilder()
          .setStreamingConfig(streamingRecognitionConfig)
          .build(); // The first request in a streaming call has to be a config

  FlacAudioFileReader mp= new FlacAudioFileReader();
  AudioInputStream in=mp.getAudioInputStream(new URL(url));
  AudioFormat targetFormat = new AudioFormat(16000, 16, 1, true, false);
  AudioInputStream audioInputStream=AudioSystem.getAudioInputStream(targetFormat, in);

  clientStream.send(request);

  long startTime = System.currentTimeMillis();

  while (true) {
    long estimatedTime = System.currentTimeMillis() - startTime;
    byte[] data = new byte[6400];
    audioInputStream.read(data);
    if (estimatedTime > 10000) { // 60 seconds
      break;
    }
    request =
        StreamingRecognizeRequest.newBuilder()
            .setAudioContent(ByteString.copyFrom(data))
            .build();
    clientStream.send(request);
  }
} catch (Exception e) {
  System.out.println(e);
}
responseObserver.onComplete();
}

背景

这里的目标是编写一种方法,该方法将侦听(无时间限制的)FLAK http音频流并执行流语音识别(使用RecognitionAudio audio = RecognitionAudio.newBuilder() .setContent(ByteString.copyFrom(baos.toByteArray())) .build(); RecognizeResponse response = client.recognize(recognitionConfig, audio); List<SpeechRecognitionResult> results = response.getResultsList(); 请求)。我正在使用最新的com.google.cloud.speech(0.72.0-beta)。

0 个答案:

没有答案