Question

我有一系列音频源，在这些频谱源中生成频谱图，每个音频源具有相同的采样率，因此对每个音频序列执行fft可使我的身高（行）相等，宽度（col）不同，其中高度跨越频点，宽度跨越时域。

因此，我通过规范化值并保持宽高比来创建一个宽度为1.0（col）和高度为col / row的图形，如以下示例所示：

if(audio_source.endswith((".flac", ".wav"))):
    raw = AudioSegment.from_file(audio_source)
else: 
    raw = AudioSegment.from_mp3(audio_source)

# single channel
raw = raw.set_channels(1)

# downsampling the audio source
raw = raw.set_frame_rate(sampling_rate)

# retrieving data
data = raw.get_array_of_samples()

# data to numpy array
data = np.array(data)

# sample frequencies, segment times(0 <-> audio's length), frequencies domain(last axis - segment times)
f, t, Sxx = signal.spectrogram(data, sampling_rate)

# dimensions
row, col = Sxx.shape[0], Sxx.shape[1]

# normalizing dimensions
row, col = 1.0, (col / row)

fig = plt.figure(figsize = (col, row), dpi = 300)
plt.set_cmap('hot')

ax = fig.add_subplot(1, 1, 1, frameon = False)
ax.pcolormesh(t, f, Sxx)
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)

plt.savefig(image_source, bbox_inches = "tight", pad_inches = -0.1)

我希望保存的图像具有与它们相同的高度，但是对于宽度而言，毫不奇怪的是它们根本不成比例，甚至不成比例：

Audio Source 1 : (129, 190) -> (1.0, 1.4728) -> reality (388px, 247px) -> expectation-> (363px, 247px)
Audio Source 2 : (129, 59)  -> (1.0, 0.4573) -> reality (84px, 247px)
Audio Source 3 : (129, 121) -> (1.0, 0.9379) -> reality (228px, 247px)

388 / 247 -> 1.5708
84 / 247 -> 0.3400

我做错了什么？

生成具有相等高度（以像素为单位）的频谱图，并且其长宽比与matplotlib保持不变吗？

0 个答案: