使用Python语音识别时的说话人区分

时间:2019-11-26 14:15:52

标签: python speech-recognition google-speech-api

在Python中使用导入的Speech_recognition时,是否可以对输出进行二值化?

我希望您能就此提出建议,或者是否可行。

此外,对于在随后以文本文件形式在每个新讲话者之间加线的情况下输出此信息的任何建议,将不胜感激。

import speech_recognition as sr

from os import path

from pprint import pprint

audio_file = path.join(path.dirname(path.realpath(__file__)), "RobertP.wav")

r = sr.Recognizer()
with sr.AudioFile(audio_file) as source:
    audio = r.record(source)

try:
    txt = r.recognize_google(audio, show_all=True)
except:
    print("Didn't work.")

text = str(txt)

f = open("tester.txt", "w+")
f.write(text)
f.close()

注意:对我的新手表示歉意。

1 个答案:

答案 0 :(得分:0)

扬声器语音隔离目前在Google Speech-to-Text API中处于测试阶段。您可以找到此功能here的文档。处理输出可以通过多种方式完成。以下是一个示例(基于this中篇文章):

import io

def transcribe_file_with_diarization(speech_file):
    “””Transcribe the given audio file synchronously with diarization.”””

    from google.cloud import speech_v1p1beta1 as speech
    client = speech.SpeechClient()

    with io.open(speech_file, ‘rb’) as audio_file:
        content = audio_file.read()
    audio = {"content": content}

    encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16
    sample_rate_hertz=48000
    language_code=’en-US’
    enable_speaker_diarization=True
    enable_automatic_punctuation=True
    diarization_speaker_count=4

    config = {
        "encoding": encoding,
        "sample_rate_hertz": sample_rate_hertz,
        "language_code": language_code,
        "enable_speaker_diarization": enable_speaker_diarization,
        "enable_automatic_punctuation": enable_automatic_punctuation,
        # Optional:
        "diarization_speaker_count": diarization_speaker_count
    }

    print(‘Waiting for operation to complete…’)
    response = client.recognize(config, audio)

    # The transcript within each result is separate and sequential per result.
    # However, the words list within an alternative includes all the words
    # from all the results thus far. Thus, to get all the words with speaker
    # tags, you only have to take the words list from the last result:

    result = response.results[-1]
    words_info = result.alternatives[0].words

    speaker1_transcript=””
    speaker2_transcript=””
    speaker3_transcript=””
    speaker4_transcript=””

    # Printing out the output:
    for word_info in words_info:
        if(word_info.speaker_tag==1): 
            speaker1_transcript=speaker1_transcript+word_info.word+’ ‘
        if(word_info.speaker_tag==2): 
            speaker2_transcript=speaker2_transcript+word_info.word+’ ‘
        if(word_info.speaker_tag==3): 
            speaker3_transcript=speaker3_transcript+word_info.word+’ ‘
        if(word_info.speaker_tag==4): 
            speaker4_transcript=speaker4_transcript+word_info.word+’ ‘

    print(“speaker1: ‘{}’”.format(speaker1_transcript))
    print(“speaker2: ‘{}’”.format(speaker2_transcript))
    print(“speaker3: ‘{}’”.format(speaker3_transcript))
    print(“speaker4: ‘{}’”.format(speaker4_transcript))