我使用Google语音API ti使用以下代码成功将语音转换为文本。
import speech_recognition as sr
import os
#obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# recognize speech using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""{KEY}
"""
# INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE
try:
speechOutput = (r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, language="si-LK"))
except sr.UnknownValueError:
speechOutput = ("Google Cloud Speech could not understand audio")
except sr.RequestError as e:
speechOutput = ("Could not request results from Google Cloud Speech service; {0}".format(e))
print(speechOutput)
我想知道我是否可以使用相同的API将文本转换为语音?如果没有使用什么API和示例python代码。 谢谢!
答案 0 :(得分:0)
为此,您需要使用目前处于测试阶段的新Text-to-Speech API。您可以在文档的“客户端库”部分中找到Python quickstart。该示例是python-docs-sample repo的一部分。在此处添加示例的相关部分以获得更好的可见性:
def synthesize_text(text):
"""Synthesizes speech from the input string of text."""
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.types.SynthesisInput(text=text)
# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
response = client.synthesize_speech(input_text, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
更新:费率和音调配置
您可以将文本元素括在<prosody>
标记中以修改rate
和pitch
。例如:
<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>
可能的值遵循W3规范,可以找到here。 Text-to-Speech API的SSML docs详细说明了这一点,它们也提供了一些示例。
此外,您可以使用speed
中的<audio>
选项控制一般音频播放速率,该选项目前接受50%到200%的值(以1%为增量)。