谷歌语音API可以将文本转换为语音吗?

时间:2018-05-06 10:48:08

标签: python-3.x google-api google-speech-api

我使用Google语音API ti使用以下代码成功将语音转换为文本。

import speech_recognition as sr
import os

#obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

# recognize speech using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""{KEY}
"""
# INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE
try:
    speechOutput = (r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, language="si-LK"))
except sr.UnknownValueError:
    speechOutput = ("Google Cloud Speech could not understand audio")
except sr.RequestError as e:
    speechOutput = ("Could not request results from Google Cloud Speech service; {0}".format(e))
print(speechOutput)

我想知道我是否可以使用相同的API将文本转换为语音?如果没有使用什么API和示例python代码。 谢谢!

1 个答案:

答案 0 :(得分:0)

为此,您需要使用目前处于测试阶段的新Text-to-Speech API。您可以在文档的“客户端库”部分中找到Python quickstart。该示例是python-docs-sample repo的一部分。在此处添加示例的相关部分以获得更好的可见性:

def synthesize_text(text):
    """Synthesizes speech from the input string of text."""
    from google.cloud import texttospeech
    client = texttospeech.TextToSpeechClient()

    input_text = texttospeech.types.SynthesisInput(text=text)

    # Note: the voice can also be specified by name.
    # Names of voices can be retrieved with client.list_voices().
    voice = texttospeech.types.VoiceSelectionParams(
        language_code='en-US',
        ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)

    audio_config = texttospeech.types.AudioConfig(
        audio_encoding=texttospeech.enums.AudioEncoding.MP3)

    response = client.synthesize_speech(input_text, voice, audio_config)

    # The response's audio_content is binary.
    with open('output.mp3', 'wb') as out:
        out.write(response.audio_content)
        print('Audio content written to file "output.mp3"')

更新:费率和音调配置

您可以将文本元素括在<prosody>标记中以修改ratepitch。例如:

<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>

可能的值遵循W3规范,可以找到here。 Text-to-Speech API的SSML docs详细说明了这一点,它们也提供了一些示例。

此外,您可以使用speed中的<audio>选项控制一般音频播放速率,该选项目前接受50%到200%的值(以1%为增量)。