Question

我有一个TTS（文本到语音）系统，该系统以numpy数组形式生成音频，其数据类型为template<typename T> int getvar (const T&) { return (*(T*) nullptr) (); } int main () { return getvar ([x = 42]() { return x; }); }。该系统在后端运行，我想将数据从后端传输到前端，以便在发生特定事件时进行播放。

此问题的明显解决方案是将音频数据作为wav文件写入磁盘，然后将路径传递到要播放的前端。这很好，但出于管理原因，我不想这样做。我只想只将音频数据（numpy数组）传输到前端。

我到目前为止所做的是：

后端

np.float32

前端

text = "Hello"
wav, sr = tts_model.synthesize(text)
data = {"snd", wav.tolist()}
flask_response = app.response_class(response=flask.json.dumps(data),
                                    status=200,
                                    mimetype='application/json' )
# then return flask_response

这是我到目前为止所做的，但是JavaScript引发以下错误：

// gets wav from backend
let arrayData = new Float32Array(wav);
let blob = new Blob([ arrayData ]);
let url = URL.createObjectURL(blob);
let snd = new Audio(url);
snd.play()

这是我要执行的操作的要点。很抱歉，由于您没有TTS系统，因此无法减少错误，因此这是它生成的audio file，您可以用来查看我在做什么错了。

我尝试过的其他事情：

将音频数据类型分别更改为Uncaught (in promise) DOMException: Failed to load because no supported source was found.和np.int8，将其转换为np.int16和Int8Array()在JavaScript中。
创建int16Array()和blob之类的{"type": "application/text;charset=utf-8;"}时尝试了不同的类型。

我已经在这个问题上苦苦挣扎了很长时间，所以任何帮助都可以申请！！

Answer 1

您的样本不能直接使用。（不玩）

但是：

StarWars3.wav：好。从cs.uic.edu
您的样本是用PCM16而不是PCM32编码的：确定（检查wav元数据）

烧瓶

from flask import Flask, render_template, json
import base64

app = Flask(__name__)

with open("sample_16.wav", "rb") as binary_file:
    # Read the whole file at once
    data = binary_file.read()
    wav_file = base64.b64encode(data).decode('UTF-8')

@app.route('/wav')
def hello_world():
    data = {"snd": wav_file}
    res = app.response_class(response=json.dumps(data),
        status=200,
        mimetype='application/json')
    return res

@app.route('/')
def stat():
    return render_template('index.html')

if __name__ == '__main__':
    app.run(debug = True)

js


  <audio controls></audio>
  <script>
    ;(async _ => {
      const res = await fetch('/wav')
      let {snd: b64buf} = await res.json()
      document.querySelector('audio').src="data:audio/wav;base64, "+b64buf;
    })()
  </script>

原始海报编辑

因此，在解决该问题之前（使用此解决方案），我最终要做的是：

首先，将数据类型从np.float32更改为np.int16：

wav = (wav * np.iinfo(np.int16).max).astype(np.int16)

使用scipy.io.wavfile将numpy数组写入一个临时的wav文件：

from scipy.io import wavfile
wavfile.write(".tmp.wav", sr, wav)

从tmp文件中读取字节：

# read the bytes
with open(".tmp.wav", "rb") as fin:
    wav = fin.read()

删除临时文件

import os
os.remove(".tmp.wav")

Answer 2

将 wav 值数组转换为字节

在合成之后，您可以将 wav 的 numpy 数组转换为字节对象，然后通过 base64 进行编码。

import io
from scipy.io.wavfile import write

bytes_wav = bytes()
byte_io = io.BytesIO(bytes_wav)
write(byte_io, sr, wav)
wav_bytes = byte_io.read()

audio_data = base64.b64encode(wav_bytes).decode('UTF-8')

这可以直接用于创建 html 音频标签作为源（使用烧瓶）：

<audio controls src="data:audio/wav;base64, {{ audio_data }}"></audio>

因此，您只需将 wav、sr 转换为表示原始 audio_data 文件的 .wav。并用作您的烧瓶应用程序的 render_template 参数。（解决不发送）

或者，如果您发送 audio_data，在您接受响应的 .js 文件中，使用 audio_data 构造 url（将像在 html 中一样作为 src 属性放置）：

// get audio_data from response

let snd = new Audio("data:audio/wav;base64, " + audio_data);
snd.play()

因为：

<块引用>

Audio(url) 返回值：一个新的 HTMLAudioElement 对象，配置为用于从 url 指定的文件中播放音频。新对象的 preload 属性设置为 auto 并且 其 src 属性设置为指定的 URL 或 null 如果没有给出网址。如果指定了 URL，则浏览器会在返回新对象之前开始异步加载媒体资源。

发送音频数据表示为从python到Javascript的numpy数组

后端

前端

我尝试过的其他事情：

2 个答案:

原始海报编辑

将 wav 值数组转换为字节