应用错误收集

使用Pytorch实现STFT与使用Librose实现STFT的结果略有不同

时间：2019-06-02 15:37:01

标签： python python-3.x fft pytorch librosa

我正在尝试用STFT实现Pytorch。但是，与Pytorch的实现相比，Librosa实现的输出略有偏离。

Librosa版本

import numpy as np
from librosa.core import stft
import matplotlib.pyplot as plt

np.random.seed(3)
y = np.sin(2*np.pi*50*np.linspace(0,10,2048))+np.sin(2*np.pi*20*np.linspace(0,10,2048)) + np.random.normal(scale=1,size=2048)

S_stft = np.abs(stft(y, hop_length=512, n_fft=2048,center=False))

plt.plot(S_stft)

Pytorch版本

import torch
from torch.autograd import Variable
from torch.nn.functional import conv1d

from scipy.signal.windows import hann

stride = 512

def create_filters(d,k,low=50,high=6000):
    x = np.arange(0, d, 1)
    wsin = np.empty((k,1,d), dtype=np.float32)
    wcos = np.empty((k,1,d), dtype=np.float32)
    start_freq = low
    end_freq = high
    # num_cycles = start_freq*d/44000.
    # scaling_ind = np.log(end_freq/start_freq)/k

    window_mask = hann(2048, sym=False) # same as 0.5-0.5*np.cos(2*np.pi*x/(k))
    for ind in range(k):
        wsin[ind,0,:] = window_mask*np.sin(2*np.pi*ind/k*x)
        wcos[ind,0,:] = window_mask*np.cos(2*np.pi*ind/k*x)

    return wsin,wcos

wsin, wcos = create_filters(2048,2048)

wsin_var = Variable(torch.from_numpy(wsin), requires_grad=False)
wcos_var = Variable(torch.from_numpy(wcos),requires_grad=False)

network_input = torch.from_numpy(y).float()
network_input = network_input.reshape(1,-1)

zx = np.sqrt(conv1d(network_input[:,None,:], wsin_var, stride=stride).pow(2)+conv1d(network_input[:,None,:], wcos_var, stride=stride).pow(2))
pytorch_Xs = zx.cpu().numpy()
plt.plot(pytorch_Xs[0,:1025,0])

我的问题

两个图形可能看起来相同，但是如果我用np.allclose检查两个输出，我们可以看到它们略有不同。

np.allclose(S_stft, pytorch_Xs[0,:1025,0].reshape(1025,1))
output >>> False

只有当我将公差调整为1e-5时，它才能为我提供True的结果

np.allclose(S_stft, pytorch_Xs[0,:1025,0].reshape(1025,1),atol=1e-5)
output >>> True

什么导致值的差异？是因为使用torch.from_numpy(y).float()进行了数据转换吗？

我希望差值小于1e-7，1e-8更好。

1 个答案:

答案 0 :(得分：0)

不同之处在于它们的默认位之间的差异。默认情况下，NumPy的float为64位。 PyTorch的浮动默认为32位。