Question

我想在Python上使用google语音API V1。

到目前为止，我已经使用google uri示例并将其收回内容。当我尝试修改代码以使用自定义录制的音频文件时，我收到谷歌的回复，但它没有任何翻译内容。

我通过以下方式设置请求：

"""Transcribe the given raw audio file asynchronously.
Args:
    audio_file: the raw audio file.
"""
audio_file = 'audioFiles/test.raw'

with open(audio_file, 'rb') as speech:
    speech_content = base64.b64encode(speech.read())

service = get_speech_service()
service_request = service.speech().asyncrecognize(
    body={
        'config': {
            'encoding': 'LINEAR16',
            'sampleRate': 16000, 
            'languageCode': 'en-US',
        },
        'audio': {
            'content': speech_content.decode('utf-8', 'ignore')
            }
        })
response = service_request.execute()

print(json.dumps(response))

name = response['name']

service = get_speech_service()
service_request = service.operations().get(name=name)

while True:
    # Get the long running operation with response.
    response = service_request.execute()

    if 'done' in response and response['done']:
        break
    else:
        # Give the server a few seconds to process.
        print('%s, waiting for results from job, %s' % (datetime.now().replace(second=0, microsecond=0), name))
        time.sleep(60)

print(json.dumps(response))

给了我一个回复：

kayl@kayl-Surface-Pro-3:~/audioConversion$ python speechToText.py 
{"name": "527788331906219767"} 2017-03-30 20:10:00, waiting for results from job, 527788331906219767
{"response": {"@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"},"done": true, "name": "527788331906219767", "metadata": {"lastUpdateTime": "2017-03-31T03:11:16.391628Z", "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata", "startTime": "2017-03-31T03:10:52.351004Z", "progressPercent": 100}}

我应该得到以下形式的回复：

{"response": {"@type":"type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse", "results":{...}}...

使用以下原始音频文件：

16000hz采样率，尝试41000hz
16位Little Endian
签名
65秒长

要录制此音频，请执行：

arecord -f cd -d 65 -r 16000 -t raw test.raw

任何可以指出我正确方向的建议都会非常感激。

Answer 1

您的示例与使用this sample的the test audio files基本相同。

您的代码是否适合您使用测试样本audio.raw？如果是这样，它很可能是编码问题。我根据best practices的建议，使用flac文件和录制音频取得了最大的成功。我过去也曾使用过Audacity来记录录音中的一些猜测。

在Mac OSX中，以下shell脚本可用于获取65秒的音频：

  rec --channels=1 --bits=16 --rate=44100 audio.wav trim 0 65

然后我使用以下代码转录音频：

from google.cloud import speech
speech_client = speech.Client()

with io.open(speech_file, 'rb') as audio_file:
    content = audio_file.read()
    audio_sample = speech_client.sample(
        content,
        source_uri=None,
        encoding='LINEAR16',
        sample_rate=44100)

operation = speech_client.speech_api.async_recognize(audio_sample)

retry_count = 100
while retry_count > 0 and not operation.complete:
    retry_count -= 1
    time.sleep(2)
    operation.poll()

if not operation.complete:
    print('Operation not complete and retry limit reached.')
    return

alternatives = operation.results
for alternative in alternatives:
    print('Transcript: {}'.format(alternative.transcript))

请注意，在我的示例中，我使用了新的客户端库，可以更轻松地访问API。 This sample code是我从中得到榜样的起点。

Google Speech API返回空白Json响应

1 个答案: