我在Python 3.6上使用了sklearn,我注意到将单个样本预测为1D numpy数组需要相同的运行时间,而将n个样本预测为具有随机森林的2D numpy数组(~0.1秒) 。看起来sklearn需要一定的时间在每个预测步骤中设置树,然后立即进行预测。这可以解释为什么用于预测大型2D阵列的运行时与1D阵列相同?
这是我训练模型的代码:
clf = RandomForestClassifier(n_estimators=1, #or > 1
n_jobs=-1,
random_state=2,
max_depth=15,
min_samples_leaf=1,
verbose=0,
max_features='auto'
)
clf.fit(X_train, y_train)
with open('classifier.pkl', 'wb') as fid:
cPickle.dump(clf, fid)
就我而言,我必须在一个循环中逐个实时预测:
with open('classifier.pkl', 'rb') as fid:
clf = cPickle.load(fid)
for s in samples:
#my feature extraction method
pred = clf.predict(feature) #feature is a 1D np array containing features
#computed for the sample s
是因为我以错误的方式使用它吗?或者sklearn只是没有针对逐个预测进行优化?
答案 0 :(得分:0)
你是对的,features = np.zeros((len(samples), n_features))
for i, s in enumerate(samples):
features[i] = feature_extraction(s)
preds = clf.predict(features)
针对向量操作进行了大量优化。您正确使用它。如果你这样做,你应该会看到显着的加速:
var userdetais = `<p><span style="color: rgb(75, 79, 86); font-family: Helvetica, Arial, sans-serif; font-size: 13px; white-space: pre-wrap; background-color: rgb(241, 240, 240);">মাসুদ আলম
সহ: শিক্ষক
ফরিদ উদ্দিন উচ্চ বিদ্যালয়
পো: আয়নাতলী, ডাকঘর: আয়নাতলী, শাহ্রাস্তি, চাঁদপুর-৩৬২২।
০১৭৪৮৬৮৫৪৮২</span></p>`;
console.log(userdetais);