Question

我最近发现Google的Vision API可以接受外部图像URL，我很好奇是否有人知道Google的Speech是否可以接受外部视频URL（例如YouTube视频）？

我脑海中的代码看起来像这样：

def transcribe_gcs(yotube_url):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    audio = types.RecognitionAudio(uri=youtube_url)  # swapped out gcs_uri with youtube_url
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
#         sample_rate_hertz=16000,
        language_code='en-US')

    operation = client.long_running_recognize(config, audio)

    print('Waiting for operation to complete...')
    response = operation.result(timeout=90)

    # Each result is for a consecutive portion of the audio. Iterate through
    # them to get the transcripts for the entire audio file.
    for result in response.results:
        # The first alternative is the most likely one for this portion.
        print(u'Transcript: {}'.format(result.alternatives[0].transcript))
        print('Confidence: {}'.format(result.alternatives[0].confidence))

Answer 1

我很好奇是否有人知道Google的演讲是否可以接受外部视频网址（例如YouTube视频）？

它必须是您的音频文件（少于1分钟的音频文件）的本地路径，或者是大于1 minute的音频文件的GCS URI。您的想法是不可能的，音频/视频文件必须位于GCS中。

Answer 2

我认为您可以通过流式传输相同的视频（例如在wowza或您选择的任何服务器上）来实现此目的，然后简单地使用ffmpeg提取音频并将其传递给Google。它应该工作。使用StreamingRecognizeRequest而不是RecognitionAudio。

Google的Speech API是否可以接受外部视频URL？

2 个答案: