Google云端平台:语音转换为大型媒体文件的文本

时间:2018-11-14 19:43:31

标签: google-cloud-platform speech-recognition speech-to-text google-speech-api google-cloud-speech

我正在尝试从youtube下载的mp4媒体文件中提取文本。由于我正在使用Google Cloud Platform,因此想尝试一下Google Cloud语音。

在完成所有安装和配置后,我复制了以下代码片段以开始使用:

with io.open(file_name, 'rb') as audio_file:
    content = audio_file.read()
    audio = types.RecognitionAudio(content=content)

config = types.RecognitionConfig(encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=16000, language_code='en-US')   

response = client.long_running_recognize(config, audio)

但是关于文件大小,我遇到了以下错误:

  

InvalidArgument:400内联音频超过持续时间限制。请使用   GCS URI。

然后我读到我应该对大型媒体文件使用流。因此,我尝试了以下代码片段:

with io.open(file_name, 'rb') as audio_file:
    content = audio_file.read()

#In practice, stream should be a generator yielding chunks of audio data.

stream = [content]
requests = (types.StreamingRecognizeRequest(audio_content=chunk)for chunk in stream)

config = types.RecognitionConfig(encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,sample_rate_hertz=16000,language_code='en-US')

streaming_config = types.StreamingRecognitionConfig(config=config)

responses = client.streaming_recognize(streaming_config, requests)

但仍然出现以下错误:

  

InvalidArgument:400无效的音频内容:太长了。

所以,任何人都可以建议一种转录mp4文件并提取文本的方法。我对大型媒体文件没有任何复杂的要求。媒体文件的最大长度为10-15分钟。谢谢

1 个答案:

答案 0 :(得分:3)

错误消息表示文件太大,您需要先将媒体文件复制到Google Cloud Storage,然后指定Cloud Storage URI,例如gs:// bucket / path / mediafile。

使用云存储URI的关键是:

  

Recognition音频音频=   RecognitionAudio.newBuilder()。setUri(gcsUri).build();

以下代码将向您展示如何为输入指定GCS URI。 Google在github上有一个complete example

  public static void syncRecognizeGcs(String gcsUri) throws Exception {
    // Instantiates a client with GOOGLE_APPLICATION_CREDENTIALS
    try (SpeechClient speech = SpeechClient.create()) {
      // Builds the request for remote FLAC file
      RecognitionConfig config =
          RecognitionConfig.newBuilder()
              .setEncoding(AudioEncoding.FLAC)
              .setLanguageCode("en-US")
              .setSampleRateHertz(16000)
              .build();
      RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();

      // Use blocking call for getting audio transcript
      RecognizeResponse response = speech.recognize(config, audio);
      List<SpeechRecognitionResult> results = response.getResultsList();

      for (SpeechRecognitionResult result : results) {
        // There can be several alternative transcripts for a given chunk of speech. Just use the
        // first (most likely) one here.
        SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
        System.out.printf("Transcription: %s%n", alternative.getTranscript());
      }
    }
  }