如何在Google Speech API中转录大文件?

时间:2017-06-27 11:59:12

标签: python asynchronous audio google-speech-api transcription

我如何转录大文件,以避免使用Google Speech API异步转录错误Operation not complete and retry limit reached.来处理大型音频文件?

可能的解决方案

If the operation has not completed, you can poll the endpoint by repeatedly making the GET request until the done property of the response is true.

  

在python中这样做是否可行?或者我应该将文件分解为较小的文件并重试?

Speech API的已知问题

  • 编码。

我做了多少

要编码的命令

ffmpeg -i 2017-06-13-17_48_51.flac -ac 1 mono.flac

为什么ffmpeg over sox?

  

我选择了ffmpeg,因为我使用sox

得到了这个错误
sox 2017-06-13-17_48_51.flac --channels=1 --bits=16 2017-06-13-17_48_51_more_stable.flac
  

袜子WARN抖动:抖动剪裁55个样本;减少量?

输入音频文件

Input File : '2017-06-13-17_48_51.flac' Channels : 2 Sample Rate : 48000 Precision : 16-bit Duration : 00:21:18.40 = 61363200 samples ~ 95880 CDDA sectors File Size : 60.7M Bit Rate : 380k Sample Encoding: 16-bit FLAC

执行此命令

ffmpeg -i 2017-06-13-17_48_51.flac -ac 1 mono.flac

输出音频文件

Input File : 'mono.flac' Channels : 1 Sample Rate : 48000 Precision : 16-bit Duration : 00:21:18.40 = 61363200 samples ~ 95880 CDDA sectors File Size : 59.9M Bit Rate : 375k Sample Encoding: 16-bit FLAC Comment : 'encoder=Lavf56.40.101'

Python文件

  

Google Speech API Asynchronous Ex。 w / Explicit Credentials

     
    

我将Flac Hertz改为“48000”并放入一个明确的环境路径

  
import argparse
import io
import time
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "cloud_speech_service_keys.json"
def transcribe_file(speech_file):
    """Transcribe the given audio file asynchronously."""
    from google.cloud import speech
    speech_client = speech.Client()

    with io.open(speech_file, 'rb') as audio_file:
        content = audio_file.read()
        audio_sample = speech_client.sample(
            content,
            source_uri=None,
            encoding='LINEAR16',
            sample_rate_hertz=16000)

    operation = audio_sample.long_running_recognize('en-US')

    retry_count = 100
    while retry_count > 0 and not operation.complete:
        retry_count -= 1
        time.sleep(2)
        operation.poll()

    if not operation.complete:
        print('Operation not complete and retry limit reached.')
        return

    alternatives = operation.results
    for alternative in alternatives:
        print('Transcript: {}'.format(alternative.transcript))
        print('Confidence: {}'.format(alternative.confidence))
    # [END send_request]
def transcribe_gcs(gcs_uri):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    speech_client = speech.Client()

    audio_sample = speech_client.sample(
        content=None,
        source_uri=gcs_uri,
        encoding='FLAC',
        sample_rate_hertz=48000)

    operation = audio_sample.long_running_recognize('en-US')

    retry_count = 100
    while retry_count > 0 and not operation.complete:
        retry_count -= 1
        time.sleep(2)
        operation.poll()

    if not operation.complete:
        print('Operation not complete and retry limit reached.')
        return

    alternatives = operation.results
    for alternative in alternatives:
        print('Transcript: {}'.format(alternative.transcript))
        print('Confidence: {}'.format(alternative.confidence))
    # [END send_request_gcs]


if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument(
        'path', help='File or GCS path for audio file to be recognized')
    args = parser.parse_args()
    if args.path.startswith('gs://'):
        transcribe_gcs(args.path)
    else:
        transcribe_file(args.path)

0 个答案:

没有答案