Question

我在Python3.6上使用sci-kit学习，我在使用Pickle加载的随机林的循环中预测时遇到了一些运行时问题。

当我使用1个估算器（树）从随机森林预测时，每个样本的运行时间约为0.002秒。但是，当我增加估算器的数量（> 1）时，无论树的数量是多少，每个样本的运行时间都会增加到0.1秒。

这是我保存模型的代码：

clf = RandomForestClassifier(n_estimators=1, #or > 1 
            n_jobs=-1,
            random_state=2,
            max_depth=15,
            min_samples_leaf=1,
            verbose=0,
            max_features='auto'
            )

clf.fit(X_train, y_train)

with open('classifier.pkl', 'wb') as fid:
    cPickle.dump(clf, fid)

这是我加载模型并在循环中预测的代码：

with open('classifier.pkl', 'rb') as fid:
    clf = cPickle.load(fid)

for s in samples:
    #my feature extraction method
    pred = clf.predict(feature) #feature is a 1D np array containing features computed for the sample s

我不明白为什么运行时与树的数量成正比，以及为什么它的增长速度如此之快。这是一个错误还是我以错误的方式使用泡菜？

拜托，你能帮助我吗？

CB

使用sklearn和pickle随机森林在循环中预测

0 个答案: