我有一个梅尔谱图,它的形状是(256, 512)
array([[7.62326422e-01, 7.51822486e-01, 7.29588015e-01, ...,
1.03616300e-01, 1.04682689e-01, 1.05411769e-01],
[1.54055389e-01, 1.38559084e-01, 1.01564167e-01, ...,
4.19907116e-02, 4.86430404e-02, 5.27331798e-02],
[2.21608739e-01, 2.09934399e-01, 1.79158230e-01, ...,
2.42307431e-01, 3.18662338e-01, 3.67405599e-01],
...,
[8.55346431e-07, 8.10918028e-07, 6.93184704e-07, ...,
5.57131738e-07, 7.08016727e-07, 8.06897948e-07],
[8.13035033e-07, 7.62384222e-07, 6.28683206e-07, ...,
4.56901031e-07, 6.36117335e-07, 7.53681318e-07],
[5.51771037e-07, 5.11662726e-07, 4.05818844e-07, ...,
4.43868592e-07, 6.27753429e-07, 7.48416680e-07]])
我想在张量流的tfa.image.sparse_image_warp
中使用此频谱。
tfa.image.sparse_image_warp(
image: tfa.image.color_ops.TensorLike,
source_control_point_locations: tfa.image.color_ops.TensorLike,
dest_control_point_locations: tfa.image.color_ops.TensorLike,
interpolation_order: int = 2,
regularization_weight: tfa.image.filters.FloatTensorLike = 0.0,
num_boundary_points: int = 0,
name: str = 'sparse_image_warp'
)-> tf.Tensor
这会将图像作为[batch, height, width, channels] float Tensor
。
如何将音频频谱转换为图像张量?如果我直接放置频谱,则会引发错误
warped_image = sparse_image_warp(mel_spectrogram,source_control_point_locations,dest_control_point_locations,interpolation_order = 1,num_boundary_points = 0)
ValueError: not enough values to unpack (expected 4, got 2)
质谱图可以如下生成。
import librosa
audio, sampling_rate = librosa.load('/content/61-70968-0002.wav')
mel_spectrogram = librosa.feature.melspectrogram(y=audio,sr=sampling_rate,n_mels=256,hop_length=128,fmax=8000)