Google语音到文本api的进度条long_running_recognize操作

时间:2019-08-19 14:46:54

标签: python speech-to-text google-speech-api google-cloud-speech

我的问题类似于this,有人在SO上提问,但又问了一次,因为最新答案已有一年多了,并且API发生了很大变化。 (我相信)

我正在执行long_running_recognize操作,想知道它的进度。

from google.cloud import speech_v1 as speech
from google.cloud.speech_v1 import enums
from google.cloud.speech_v1 import types

gcs_uri = 'gs://my-new-videos/a49e0bf49a2e4d95b322bbf802e09d0e.wav'
client = speech.SpeechClient()

audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
    encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=44100,
    language_code='en-US',
    audio_channel_count=2,
    enable_separate_recognition_per_channel=False,
    model='video',
    enable_word_time_offsets=False)

# ideally a way to get some sort of progress bar to know how long to wait.
operation = client.long_running_recognize(config, audio) 
print('Waiting for operation to complete...')
response = operation.result(timeout=90)

显然,一个人可以运行operation.running()operation.done()来获取operation的状态,但是我无法弄清楚如何使用它来告诉我需要多长时间等待或已经完成了多少。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

我尝试了您的示例,但是直到运行response = operation.result(timeout=90)才开始处理,然后似乎阻止了代码执行。相反,如果我们使用回调方法,例如here中的方法,则可以在等待操作完成的同时访问Operation.metadata.progress_percent。作为示例,我每隔5秒检查一次进度:

import time

from google.cloud import speech_v1
from google.cloud.speech_v1 import enums


client = speech_v1.SpeechClient()

encoding = enums.RecognitionConfig.AudioEncoding.FLAC
sample_rate_hertz = 16000
language_code = 'en-US'
config = {'encoding': encoding, 'sample_rate_hertz': sample_rate_hertz, 'language_code': language_code}
uri = 'gs://gcs-test-data/vr.flac'
audio = {'uri': uri}

response = client.long_running_recognize(config, audio)

def callback(operation_future):
    result = operation_future.result()
    progress = response.metadata.progress_percent
    print(result)

response.add_done_callback(callback)

progress = 0

while progress < 100:
    try:
        progress = response.metadata.progress_percent
        print('Progress: {}%'.format(progress))
    except:
        pass
    finally:
        time.sleep(5)

请注意,在这种情况下,我使用了一个简短的公共音频文件,它的范围从0到100%,但似乎可以正常工作:

Progress: 0%
...
Progress: 0%
results {
  alternatives {
    transcript: "it\'s okay so what am I doing here why am I here at GDC talking about VR video it\'s because I believe my favorite games I love games I believe in games my favorite games are the ones that are all about the stories I love narrative game design I love narrative-based games and I think that when it comes to telling stories in VR bring together capturing the world with narrative based games and narrative based game design is going to unlock some of the killer apps and killer stories of the medium"
    confidence: 0.959626555443
  }
}
results {
  alternatives {
    transcript: "so I\'m really here looking for people who are interested in telling us or two stories that are planning projects around telling those types of stories and I would love to talk to you so if this sounds like your project if you\'re looking at blending VR video and interactivity to tell a story I want to talk to you I want to help you so if this sounds like you please get in touch please come find me I\'ll be here all week I have pink hair I work for Google and I would love to talk with you further about VR video interactivity and storytelling"
    confidence: 0.954977035522
  }
}

Progress: 100%