Question

对于我的艺术研究项目，我正在开发一个图像声音合成器。基本上是一个麦克风，用于将图像转换为声音输出，以呈现神经网络“看到”的声音。

基本要点是：我基于带有自定义模型，opencv和网络摄像头的此存储库https://github.com/Axionable/Axionaut提取卷积神经网络的特征图。然后，这些图像通过matplotlib和librosa转换为声谱图，最终转换为声音。这一切都令人满意，但是我目前正在努力将其原型化为实时过程（考虑了较小的延迟）。

Image Spectrogram

我想要拥有的输入流（图像）和输出流（声音）类似于callback stream库的python-sounddevice。

我的想法是连续输入频谱图（也许这可能是一个扩展的numpy数组？），然后输出波形。

在正确方向上的任何帮助都将受到高度赞赏。

不确定代码是否必要，但不确定部分功能


#Shape of image is a ndarray of 1696, 1024 with values from -4.0 to 0.
print(image.shape)

# Map the colors to the colormap index
values = np.interp(values, (values.min(), values.max()), (-4., 0.))

频谱图的转换基于this example

# Invert from the spectrogram back to a waveform
recovered_audio_orig = invert_pretty_spectrogram(
    values, fft_size=fft_size, step_size=step_size, log=True, n_iter=10
)
print(recovered_audio_orig.shape)
# output is (218112,)

转换为wav格式

import librosa.core as lc
import librosa
import librosa.display
import librosa.util


_n_fft=800
print(str(_n_fft))
_hop_length=int(_n_fft/4)
print(_hop_length)
plt.figure(figsize=(12, 8))

# Variables effects
pitch_shift = -60.0
rate_change = 0.2

# Converts the 2D array (spectrogram) to a waveform
iStftMat = lc.istft(values, hop_length=_hop_length)

# Pitch shifter
iStftMat = librosa.amplitude_to_db(iStftMat, ref=1.0, amin=1e-05, top_db=80.0)

scipy.io.wavfile.write("testOut.wav", 44100, iStftMat)

powerMat = np.abs(iStftMat)
print("powerMat shape = " + str(powerMat.shape))
print("First item is: ", powerMat[0])

IPython.display.Audio(data=iStftMat, rate=rate)  # play the audio```

频谱图实时处理到音频流

0 个答案: