我想在Python上使用google语音API V1。
到目前为止,我已经使用google uri示例并将其收回内容。当我尝试修改代码以使用自定义录制的音频文件时,我收到谷歌的回复,但它没有任何翻译内容。
我通过以下方式设置请求:
"""Transcribe the given raw audio file asynchronously.
Args:
audio_file: the raw audio file.
"""
audio_file = 'audioFiles/test.raw'
with open(audio_file, 'rb') as speech:
speech_content = base64.b64encode(speech.read())
service = get_speech_service()
service_request = service.speech().asyncrecognize(
body={
'config': {
'encoding': 'LINEAR16',
'sampleRate': 16000,
'languageCode': 'en-US',
},
'audio': {
'content': speech_content.decode('utf-8', 'ignore')
}
})
response = service_request.execute()
print(json.dumps(response))
name = response['name']
service = get_speech_service()
service_request = service.operations().get(name=name)
while True:
# Get the long running operation with response.
response = service_request.execute()
if 'done' in response and response['done']:
break
else:
# Give the server a few seconds to process.
print('%s, waiting for results from job, %s' % (datetime.now().replace(second=0, microsecond=0), name))
time.sleep(60)
print(json.dumps(response))
给了我一个回复:
kayl@kayl-Surface-Pro-3:~/audioConversion$ python speechToText.py
{"name": "527788331906219767"} 2017-03-30 20:10:00, waiting for results from job, 527788331906219767
{"response": {"@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"},"done": true, "name": "527788331906219767", "metadata": {"lastUpdateTime": "2017-03-31T03:11:16.391628Z", "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata", "startTime": "2017-03-31T03:10:52.351004Z", "progressPercent": 100}}
我应该得到以下形式的回复:
{"response": {"@type":"type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse", "results":{...}}...
使用以下原始音频文件:
要录制此音频,请执行:
arecord -f cd -d 65 -r 16000 -t raw test.raw
任何可以指出我正确方向的建议都会非常感激。
答案 0 :(得分:2)
您的示例与使用this sample的the test audio files基本相同。
您的代码是否适合您使用测试样本audio.raw
?如果是这样,它很可能是编码问题。我根据best practices的建议,使用flac文件和录制音频取得了最大的成功。我过去也曾使用过Audacity来记录录音中的一些猜测。
在Mac OSX中,以下shell脚本可用于获取65秒的音频:
rec --channels=1 --bits=16 --rate=44100 audio.wav trim 0 65
然后我使用以下代码转录音频:
from google.cloud import speech
speech_client = speech.Client()
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio_sample = speech_client.sample(
content,
source_uri=None,
encoding='LINEAR16',
sample_rate=44100)
operation = speech_client.speech_api.async_recognize(audio_sample)
retry_count = 100
while retry_count > 0 and not operation.complete:
retry_count -= 1
time.sleep(2)
operation.poll()
if not operation.complete:
print('Operation not complete and retry limit reached.')
return
alternatives = operation.results
for alternative in alternatives:
print('Transcript: {}'.format(alternative.transcript))
请注意,在我的示例中,我使用了新的客户端库,可以更轻松地访问API。 This sample code是我从中得到榜样的起点。