Question

我正在尝试使用Google Cloud Speech-to-Text API。

我根据API文档的理解将mp3音频文件格式转换为.raw，并上传到存储桶。

这是我的代码：

def transcribe_gcs(gcs_uri):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    audio = types.RecognitionAudio(uri=gcs_uri)
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=16000,
        language_code='en-US')

    operation = client.long_running_recognize(config, audio)

    print('Waiting for operation to complete...')
    response = operation.result()

    # Each result is for a consecutive portion of the audio. Iterate through
    # them to get the transcripts for the entire audio file.
    for result in response.results:
        # The first alternative is the most likely one for this portion.
        print(u'Transcript: {}'.format(result.alternatives[0].transcript))
        print('Confidence: {}'.format(result.alternatives[0].confidence))

transcribe_gcs("gs://cloudh3-200314.appspot.com/cs.raw")

我做错了什么？

Answer 1

我遇到了类似的问题，这与可接受的格式有关。即使您可能已转换为RAW，但格式仍然可能存在问题，如果无法读取文件，它将不会为您提供输出。

我最近处理了一个56分钟的音频，该音频花费了17分钟，因此应该让您知道应该持续多长时间。

使用sox处理文件，我发现使用命令-

可以使用的转换参数

Google Cloud语音转文本API - 无限期等待

1 个答案: