Question

我正在尝试使用Conda Accelerate来加速某些数据预处理，但初始基准测试表明我没有正确使用它或者它对FFT＆amp; numpy和librosa中的线性代数执行时间。重新阅读文献 - 这是否意味着我应该像NumbaPro的batch-matmul example那样装饰和重新编码每个ndarray操作？我假设我只是简单地安装了它并且使得numpy更快，但事实并非如此。

基准和代码如下。我已经通过conda install accelerate安装了加速版，并且还可以将其导入。

谢谢！

结果 - conda install accelerate

之前和之后的差异可以忽略不计

Total time was 25.356
Total load time was 1.6743
Total math time was 22.1599
Total save time was 1.5139
Total stft math time was 12.9219
Total other numpy math time was 9.1886

相关代码：

loads, maths, saves = [], [], []
stfts, nps = [], []
# now we have a dict of all source files grouped by voice                                         
for i in range(30):
    v0_fn = v0_list[i]
    v1_fn = v1_list[i]
    tl0 = time.time()
    # Process v0 & v1 file                                                                        
    v0_fn = signal_dir+v0_fn
    v0, fs_s =  librosa.load(v0_fn, sr=None)
    v1_fn = signal_dir+v1_fn
    v1, fs_s =  librosa.load(v1_fn, sr=None)
    tl1 = time.time()
    loads.append((tl1-tl0))
    mix = v0 + v1
    # Capture the magnitude and phase of signal and signal + noise                                
    tm0 = time.time()
    v0_stft = librosa.stft(v0, int(frame_size*fs), int(step_size*fs)).transpose()
    tm1 = time.time()
    v0_mag = (v0_stft.real**2 + v0_stft.imag**2)**0.5
    v0_pha = np.arctan2(v0_stft.imag, v0_stft.real)
    v0_rtheta = np.stack((v0_mag, v0_pha), axis=0)
    tm2 = time.time()
    v1_stft = librosa.stft(v1, int(frame_size*fs), int(step_size*fs)).transpose()
    tm3 = time.time()
    v1_mag = (v1_stft.real**2 + v1_stft.imag**2)**0.5
    v1_pha = np.arctan2(v1_stft.imag, v1_stft.real)
    v1_rtheta = np.stack((v1_mag, v1_pha), axis=0)
    tm4 = time.time()
    mix_stft = librosa.stft(mix, int(frame_size*fs), int(step_size*fs)).transpose()
    tm5 = time.time()
    mix_mag = (mix_stft.real**2 + mix_stft.imag**2)**0.5
    mix_pha = np.arctan2(mix_stft.imag, mix_stft.real)
    mix_rtheta = np.stack((mix_mag, mix_pha), axis=0)
    tm6 = time.time()   
    stfts += [tm1-tm0, tm3-tm2, tm5-tm4]
    nps += [tm2-tm1, tm4-tm3, tm6-tm5]                            
    data['sig_rtheta'] = v0_rtheta
    data['noi_rtheta'] = v1_rtheta
    data['mix_rtheta'] = mix_rtheta
    tl2 = time.time()
    maths.append(tl2-tl1)
    with open(write_name, 'w') as f:
        cPickle.dump(all_info, f, protocol=-1)
    tl3 = time.time()
    saves.append(tl3-tl2)

t1 = time.time()
print 'Total time was %.3f' % (t1-t0)
print 'Total load time was %.4f' % np.sum(loads)
print 'Total math time was %.4f' % np.sum(maths)
print 'Total save time was %.4f' % np.sum(saves)
print 'Total stft math was %.4f' % np.sum(stfts)
print 'Total other numpy math time was %.4f' % np.sum(nps)

如何使用conda加速/基准测试？

0 个答案: