应用错误收集

RFECV具有并行作业

时间：2016-07-11 18:12:07

标签： parallel-processing scikit-learn

我想使用包含的交叉验证（RFECV）执行递归特征消除。我的问题是虽然我已经对我的数据进行了大量的子采样，但是我的功能数量很多（278），这个过程太慢了，可能在我分配给我的实验时没有得出结论。

我已经看到scikit-learn中的典型交叉验证通过定义可以并行运行的作业数来支持并行化。是否有可能将RFECV的任务并行化？

1 个答案:

答案 0 :(得分：0)

changelog for the version 0.18 release表明RFECV现在支持let allNodes = Array.from(document.querySelectorAll('*')); let setOfColors = allNodes.reduce((colors,el) => { return colors.add(getComputedStyle(el).backgroundColor) }, new Set) let arrayOfColors = Array.from(setOfColors);。

按照RFECV documentation中的示例（我将n_jobs从 50 更改为 5000 ）

n_samples

1个职位：22.5s

from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.svm import SVR

X, y = make_friedman1(n_samples=5000, n_features=5, random_state=0)
estimator = SVR(kernel="linear")

4个职位：11.8s

%%time
selector = RFECV(estimator, step=1, cv=5, n_jobs=1)
selector = selector.fit(X, y)

CPU times: user 23.1 s, sys: 2.71 s, total: 25.8 s
Wall time: 22.5 s