Question

我的脚本正在调用librosa模块，以计算短音频片段的梅尔频率倒谱系数（MFCC）。加载音频后，我想尽可能快地计算这些（以及其他一些音频功能），因此可以进行多处理。

问题：多处理变体比顺序变体慢得多。分析表明，我的代码90％以上的时间都花在<method 'acquire' of '_thread.lock' objects>上。如果有很多小任务也就不足为奇了，但是在一个测试案例中，我将音频分为4个块，然后分别处理。我当时认为开销应该很小，但实际上，它和许多小任务一样糟糕。

据我了解， multiprocessing 模块应该分叉几乎所有东西，并且不应该为争夺锁而战。但是，结果似乎表明有所不同。可能是 librosa 模块在下面保留了某种内部锁吗？

我的配置文件以纯文本显示：https://drive.google.com/open?id=17DHfmwtVOJOZVnwIueeoWClUaWkvhTPc

作为图片：https://drive.google.com/open?id=1KuZyo0CurHd9GjXge5CYQhdWn2Q6OG8Z

用于重现“问题”的代码：

import time
import numpy as np
import librosa
from functools import partial
from multiprocessing import Pool

n_proc = 4

y, sr = librosa.load(librosa.util.example_audio_file(), duration=60) # load audio sample
y = np.repeat(y, 10) # repeat signal so that we can get more reliable measurements
sample_len = int(sr * 0.2) # We will compute MFCC for short pieces of audio

def get_mfcc_in_loop(audio, sr, sample_len):
    # We split long array into small ones of lenth sample_len
    y_windowed = np.array_split(audio, np.arange(sample_len, len(audio), sample_len))
    for sample in y_windowed:
        mfcc = librosa.feature.mfcc(y=sample, sr=sr)

start = time.time()
get_mfcc_in_loop(y, sr, sample_len)
print('Time single process:', time.time() - start)

# Let's test now feeding these small arrays to pool of 4 workers. Since computing
# MFCCs for these small arrays is fast, I'd expect this to be not that fast
start = time.time()
y_windowed = np.array_split(y, np.arange(sample_len, len(y), sample_len))
with Pool(n_proc) as pool:
    func = partial(librosa.feature.mfcc, sr=sr)
    result = pool.map(func, y_windowed)
print('Time multiprocessing (many small tasks):', time.time() - start)

# Here we split the audio into 4 chunks and process them separately. This I'd expect
# to be fast and somehow it isn't. What could be the cause? Anything to do about it?
start = time.time()
y_split = np.array_split(y, n_proc)
with Pool(n_proc) as pool:
    func = partial(get_mfcc_in_loop, sr=sr, sample_len=sample_len)
    result = pool.map(func, y_split)
print('Time multiprocessing (a few large tasks):', time.time() - start)

我的机器上的结果：

单次处理时间：8.48s
时间多处理（许多小任务）：44.20s
时间多处理（一些大任务）：41.99s

任何想法是由什么引起的？更好的是，如何使其变得更好？

Answer 1

要调查正在发生的事情，我运行top -H，并发现生成了+60个线程！就是这样事实证明， librosa 和依赖项产生了许多额外的线程，这些线程一起破坏了并行性。

解决方案

joblib docs中很好地描述了超额预订问题。然后使用它。

import time
import numpy as np
import librosa
from joblib import Parallel, delayed

n_proc = 4

y, sr = librosa.load(librosa.util.example_audio_file(), duration=60) # load audio sample
y = np.repeat(y, 10) # repeat signal so that we can get more reliable measurements
sample_len = int(sr * 0.2) # We will compute MFCC for short pieces of audio

def get_mfcc_in_loop(audio, sr, sample_len):
    # We split long array into small ones of lenth sample_len
    y_windowed = np.array_split(audio, np.arange(sample_len, len(audio), sample_len))
    for sample in y_windowed:
        mfcc = librosa.feature.mfcc(y=sample, sr=sr)

start = time.time()
y_windowed = np.array_split(y, np.arange(sample_len, len(y), sample_len))
Parallel(n_jobs=n_proc, backend='multiprocessing')(delayed(get_mfcc_in_loop)(audio=data, sr=sr, sample_len=sample_len) for data in y_windowed)
print('Time multiprocessing with joblib (many small tasks):', time.time() - start)


y_split = np.array_split(y, n_proc)
start = time.time()
Parallel(n_jobs=n_proc, backend='multiprocessing')(delayed(get_mfcc_in_loop)(audio=data, sr=sr, sample_len=sample_len) for data in y_split)
print('Time multiprocessing with joblib (a few large tasks):', time.time() - start)

结果：

使用joblib进行时间多处理（许多小任务）：2.66
使用joblib进行时间多处理（一些大任务）：2.65

比使用 multiprocessing 模块快15倍。

调用外部模块时，多处理池变慢

1 个答案:

解决方案