Question

我正在使用从套接字读取xml的现有程序，将文本转换为wav文件，然后通过音频输出设备播放。

我想将其删除，以便直接播放音频文字。

现在我很难搞清楚我是否有正确的代码并了解它是否真的创建了wav文件。

调用文字转语音功能的功能

def generate_audio(self, language, voice=None):
    info = self.get_first_info(language, bestmatch=False)
    if info is None:
        self.media_info[language] = None
        return False

    truncate = not self.broadcast_immediately() and bcastplayer.Config.setting('alerts_truncate')
    message_text = info.get_message_text(truncate)

    location = bcastplayer.ObData.get_datadir() + "/alerts"
    if os.access(location, os.F_OK) == False:
        os.mkdir(location)
    filename = self.reference(self.sent, self.identifier) + "-" + language + ".wav"

    resources = info.get_resources('audio')
    if resources:
        if resources[0].write_file(os.path.join(location, filename)) is False:
            return False

    elif message_text:
        self.write_tts_file(os.path.join(location, filename), message_text, voice)

    else:
        return False

可以修改此功能直接播放音频吗？

def write_tts_file(self, path, message_text, voice=None):
    if not voice:
        voice = 'en'
    proc = subprocess.Popen([ 'espeak', '-m', '-v', voice, '-s', '130', '--stdout' ], stdin=subprocess.PIPE, stdout=subprocess.PIPE, close_fds=True)
    (stdout, stderr) = proc.communicate(message_text.encode('utf-8') + b" <break time=\"2s\" /> " + message_text.encode('utf-8') + b" <break time=\"3s\" /> ")
    proc.wait()

    with open(path, 'wb') as f:
        f.write(stdout)

我从未见过使用process，subprocess，stdout，PIPE的代码。

是否可以轻松地将子流程代码更改为仅在不创建wav文件的情况下将输出管道或重定向到aplay的内容？

还有另一个答案可能提供线索 - 但同样，我的新手理解并不确定如何将此代码转换为该答案

How to use python Popen with a espeak and aplay

Answer 1

您可以使用subprocess.PIPE将这两个进程链接在一起。以下是write_tts_file函数的修改版本：

def write_tts_file(self, path, message_text, voice=None):
    if not voice:
        voice = 'en'
    proc = subprocess.Popen(['espeak', '-m', '-v', voice, '-s', '130', '--stdout' ], stdin=subprocess.PIPE, stdout=subprocess.PIPE, close_fds=True)
    aplay = subprocess.Popen(['aplay', '-D', 'sysdefault'], stdin=proc.stdout)
    proc.stdin.write(message_text.encode('utf-8') + b" <break time=\"2s\" /> " + message_text.encode('utf-8') + b" <break time=\"3s\" /> \n")
    proc.stdin.close()
    proc.wait()

在发送要发送的消息后关闭proc＆＃39; stdin非常重要。这将使proc在发送数据时退出，并将其输出关闭到aplay，而proc将在完成播放后退出。如果import tensorflow as tf import tflearn from tflearn.layers.core import input_data, fully_connected, dropout from tflearn.layers.conv import conv_2d, max_pool_2d from tflearn.data_utils import image_dirs_to_samples, to_categorical from tflearn.layers.estimator import regression if __name__ == '__main__': NUM_CATEGORIES = 5 X, Y = image_dirs_to_samples('./flower_photos_100') Y = to_categorical(Y, NUM_CATEGORIES) net = input_data(shape=[None, 299, 299, 3]) net = conv_2d(net, 32, 3, activation='relu', name='conv_0') net = max_pool_2d(net, 2, name='max_pool_0') net = dropout(net, 0.75, name='dropout_0') for i in range(4): net = conv_2d(net, 64, 3, activation='relu', name='conv_{}'.format(i)) net = max_pool_2d(net, 2, name='max_pool_{}'.format(i)) net = dropout(net, 0.5, name='dropout_{}'.format(i)) net = fully_connected(net, 512, activation='relu') net = dropout(net, 0.5, name='dropout_fc') softmax = fully_connected(net, NUM_CATEGORIES, activation='softmax') rgrs = regression(softmax, optimizer='adam', loss='categorical_crossentropy', learning_rate=0.001) model = tflearn.DNN(rgrs, checkpoint_path='rs_ckpt', max_checkpoints=3) model.fit(X, Y, n_epoch=10, validation_set=0.1, shuffle=True, snapshot_step=100, show_metric=True, batch_size=64, run_id='rs')的输入未被关闭，则他们都不会退出。

转换python espeak + subprocess代码直接播放输出音频

1 个答案: