在Python中使用导入的Speech_recognition时,是否可以对输出进行二值化?
我希望您能就此提出建议,或者是否可行。
此外,对于在随后以文本文件形式在每个新讲话者之间加线的情况下输出此信息的任何建议,将不胜感激。
import speech_recognition as sr
from os import path
from pprint import pprint
audio_file = path.join(path.dirname(path.realpath(__file__)), "RobertP.wav")
r = sr.Recognizer()
with sr.AudioFile(audio_file) as source:
audio = r.record(source)
try:
txt = r.recognize_google(audio, show_all=True)
except:
print("Didn't work.")
text = str(txt)
f = open("tester.txt", "w+")
f.write(text)
f.close()
注意:对我的新手表示歉意。
答案 0 :(得分:0)
扬声器语音隔离目前在Google Speech-to-Text API中处于测试阶段。您可以找到此功能here的文档。处理输出可以通过多种方式完成。以下是一个示例(基于this中篇文章):
import io
def transcribe_file_with_diarization(speech_file):
“””Transcribe the given audio file synchronously with diarization.”””
from google.cloud import speech_v1p1beta1 as speech
client = speech.SpeechClient()
with io.open(speech_file, ‘rb’) as audio_file:
content = audio_file.read()
audio = {"content": content}
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16
sample_rate_hertz=48000
language_code=’en-US’
enable_speaker_diarization=True
enable_automatic_punctuation=True
diarization_speaker_count=4
config = {
"encoding": encoding,
"sample_rate_hertz": sample_rate_hertz,
"language_code": language_code,
"enable_speaker_diarization": enable_speaker_diarization,
"enable_automatic_punctuation": enable_automatic_punctuation,
# Optional:
"diarization_speaker_count": diarization_speaker_count
}
print(‘Waiting for operation to complete…’)
response = client.recognize(config, audio)
# The transcript within each result is separate and sequential per result.
# However, the words list within an alternative includes all the words
# from all the results thus far. Thus, to get all the words with speaker
# tags, you only have to take the words list from the last result:
result = response.results[-1]
words_info = result.alternatives[0].words
speaker1_transcript=””
speaker2_transcript=””
speaker3_transcript=””
speaker4_transcript=””
# Printing out the output:
for word_info in words_info:
if(word_info.speaker_tag==1):
speaker1_transcript=speaker1_transcript+word_info.word+’ ‘
if(word_info.speaker_tag==2):
speaker2_transcript=speaker2_transcript+word_info.word+’ ‘
if(word_info.speaker_tag==3):
speaker3_transcript=speaker3_transcript+word_info.word+’ ‘
if(word_info.speaker_tag==4):
speaker4_transcript=speaker4_transcript+word_info.word+’ ‘
print(“speaker1: ‘{}’”.format(speaker1_transcript))
print(“speaker2: ‘{}’”.format(speaker2_transcript))
print(“speaker3: ‘{}’”.format(speaker3_transcript))
print(“speaker4: ‘{}’”.format(speaker4_transcript))