Question

我有一个梅尔谱图，它的形状是(256, 512)

array([[7.62326422e-01, 7.51822486e-01, 7.29588015e-01, ...,
    1.03616300e-01, 1.04682689e-01, 1.05411769e-01],
   [1.54055389e-01, 1.38559084e-01, 1.01564167e-01, ...,
    4.19907116e-02, 4.86430404e-02, 5.27331798e-02],
   [2.21608739e-01, 2.09934399e-01, 1.79158230e-01, ...,
    2.42307431e-01, 3.18662338e-01, 3.67405599e-01],
   ...,
   [8.55346431e-07, 8.10918028e-07, 6.93184704e-07, ...,
    5.57131738e-07, 7.08016727e-07, 8.06897948e-07],
   [8.13035033e-07, 7.62384222e-07, 6.28683206e-07, ...,
    4.56901031e-07, 6.36117335e-07, 7.53681318e-07],
   [5.51771037e-07, 5.11662726e-07, 4.05818844e-07, ...,
    4.43868592e-07, 6.27753429e-07, 7.48416680e-07]])

我想在张量流的tfa.image.sparse_image_warp中使用此频谱。

tfa.image.sparse_image_warp(
image: tfa.image.color_ops.TensorLike,
source_control_point_locations: tfa.image.color_ops.TensorLike,
dest_control_point_locations: tfa.image.color_ops.TensorLike,
interpolation_order: int = 2,
regularization_weight: tfa.image.filters.FloatTensorLike = 0.0,
num_boundary_points: int = 0,
name: str = 'sparse_image_warp'

）-> tf.Tensor

这会将图像作为[batch, height, width, channels] float Tensor。

如何将音频频谱转换为图像张量？如果我直接放置频谱，则会引发错误

warped_image = sparse_image_warp(mel_spectrogram,source_control_point_locations,dest_control_point_locations,interpolation_order = 1,num_boundary_points = 0)

ValueError: not enough values to unpack (expected 4, got 2)

质谱图可以如下生成。

import librosa
audio, sampling_rate = librosa.load('/content/61-70968-0002.wav')
mel_spectrogram = librosa.feature.melspectrogram(y=audio,sr=sampling_rate,n_mels=256,hop_length=128,fmax=8000)

音频频谱图到图像张量

0 个答案: