Question

使用Google语音转文本，我可以使用默认参数来转录音频剪辑。但是，在使用enable_speaker_diarization标签在音频片段中配置单个扬声器时，出现错误消息。 Google记录here 这是一个可识别的音频剪辑，因此我使用了Google建议的here

异步请求

我的代码-

def transcribe_gcs(gcs_uri):
from google.cloud import speech
from google.cloud import speech_v1 as speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri = gcs_uri)
config = speech.types.RecognitionConfig(encoding=speech.enums.RecognitionConfig.AudioEncoding.FLAC, 
                                        sample_rate_hertz= 16000, 
                                        language_code = 'en-US',
                                       enable_speaker_diarization=True,
                                        diarization_speaker_count=2)

operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=3000)
result = response.results[-1]

words_info = result.alternatives[0].words

for word_info in words_info:
    print("word: '{}', speaker_tag: {}".format(word_info.word, word_info.speaker_tag))

使用后-

transcribe_gcs('gs://bucket_name/filename.flac')

我收到错误

ValueError: Protocol message RecognitionConfig has no "enable_speaker_diarization" field.

我确定这与库有关，我使用了所有可以找到的变体

from google.cloud import speech_v1p1beta1 as speech
from google.cloud import speech

但是我一直收到相同的错误。注意-在运行此代码之前，我已经使用JSON文件进行了身份验证。

Answer 1

enable_speaker_diarization=True中的speech.types.RecognitionConfig参数目前仅在库speech_v1p1beta1中可用，因此，您需要导入该库才能使用该参数，而不是默认语音一。我对您的代码做了一些修改，对我来说很好用。考虑到您需要使用服务帐户来运行此代码。

def transcribe_gcs(gcs_uri):
    from google.cloud import speech_v1p1beta1 as speech
    from google.cloud.speech_v1p1beta1 import enums
    from google.cloud.speech_v1p1beta1 import types
    client = speech.SpeechClient()
    audio = types.RecognitionAudio(uri = gcs_uri)
    config = speech.types.RecognitionConfig( language_code = 'en-US',enable_speaker_diarization=True, diarization_speaker_count=2)
    operation = client.long_running_recognize(config, audio)
    print('Waiting for operation to complete...')
    response = operation.result(timeout=3000)
    result = response.results[-1]

    words_info = result.alternatives[0].words

    tag=1
    speaker=""

    for word_info in words_info:
        if word_info.speaker_tag==tag:
            speaker=speaker+" "+word_info.word

        else:
            print("sepaker {}: {}".format(tag,speaker))
            tag=word_info.speaker_tag
            speaker=""+word_info.word

结果应为：

Answer 2

错误原因也类似于Node JS用户。通过此调用导入测试版功能，然后使用扬声器识别功能。

const speech = require('@google-cloud/speech').v1p1beta1;

Answer 3

该错误是因为您尚未导入某些文件。为此，请导入以下文件。

from google.cloud import speech_v1p1beta1 as speech
from google.cloud.speech_v1p1beta1 import enums
from google.cloud.speech_v1p1beta1 import types

Google Cloud语音转文本中的enable_speaker_diarization标签错误

3 个答案: