如何在Python中使用Google的语音转换API

时间:2019-02-14 21:45:27

标签: python api text-to-speech

我的钥匙已准备就绪,可以发出请求并从Google文本中获取语音。
我尝试了这些命令等等。
对于我发现的Python入门,文档没有提供直接的解决方案。我不知道我的API密钥与JSON和URL一起放在哪里

One solution in their docs here is for CURL.。但是涉及到在必须将请求发送回给他们以获取文件之后将其下载的txt文件。有没有一种方法可以在Python中完成,而该方法不包含我必须返回的txt? 我只希望将我的字符串列表作为音频文件返回。

My Code

(我将我的实际密钥放在上面的块中。我只是不想在这里共享它。)

3 个答案:

答案 0 :(得分:2)

为JSON文件配置Python应用并安装客户端库

  1. 创建服务帐户
  2. 使用服务帐户here创建服务帐户密钥
  3. JSON文件下载并安全保存
  4. 在您的Python应用程序中包含Google应用程序凭据
  5. 安装库:pip install --upgrade google-cloud-texttospeech

使用Google的Python示例: https://cloud.google.com/text-to-speech/docs/reference/libraries 注意:在Google的示例中,它没有正确包含name参数。 和 https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/texttospeech/cloud-client/quickstart.py

下面是使用Google应用凭据和女性的wavenet语音对示例进行的修改。

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/yourproject-12345.json"

from google.cloud import texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Do no evil!")

# Build the voice request, select the language code ("en-US") 
# ****** the NAME
# and the ssml voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
    language_code='en-US',
    name='en-US-Wavenet-C',
    ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)

# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
    audio_encoding=texttospeech.enums.AudioEncoding.MP3)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)

# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

语音,姓名,语言代码,SSML性别等等

声音列表:https://cloud.google.com/text-to-speech/docs/voices

在上面的代码示例中,我将语音从Google的示例代码中更改为包括name参数,并使用Wavenet语音(功能已大大改进,但价格昂贵的$ 16 /百万字符)和SSML性别改为FEMALE。

voice = texttospeech.types.VoiceSelectionParams(
        language_code='en-US',
        name='en-US-Wavenet-C',
        ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)

答案 1 :(得分:1)

如果您想避免使用Google Python API,只需执行以下操作:

import requests 
import json

url = "https://texttospeech.googleapis.com/v1beta1/text:synthesize"

text = "This is a text"

data = {
        "input": {"text": text},
        "voice": {"name":  "fr-FR-Wavenet-A", "languageCode": "fr-FR"},
        "audioConfig": {"audioEncoding": "MP3"}
      };

headers = {"content-type": "application/json", "X-Goog-Api-Key": "YOUR_API_KEY" }

r = requests.post(url=url, json=data, headers=headers)
content = json.loads(r.content)

它与您所做的相似,但是您需要包括您的API密钥。

答案 2 :(得分:0)

找到了答案,却失去了我打开的150个Google文档页面之间的链接。

#(Since I'm using a Jupyter Notebook)
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/Path/to/JSON/file/jsonfile.json"
from google.cloud import texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Hello, World!")

# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
    language_code='en-US',
    ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL)

# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
    audio_encoding=texttospeech.enums.AudioEncoding.MP3)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)

# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

我费时的追求是尝试使用Python通过JSON发送请求,但这似乎是通过自己的模块进行的,效果很好。 请注意,默认语音性别为“中性”。