我正在使用scikit-learn进行Metaheuristics练习,我有一个疑问:我需要使用knn,所以我有一个带有n_jobs = -1的KNearestNeighbors对象。正如文档所说,我必须将多处理模式设置为forkserver。但是,使用n_jobs = -1比使用n_jobs = 1时,knn更慢。
这是一段代码
### Some initialization here ###
skf = StratifiedKFold(target, n_folds=2, shuffle=True)
for train_index, test_index in skf:
data_train, data_test = data[train_index], data[test_index]
target_train, target_test = target[train_index], target[test_index]
start = time()
selected_features, score = SFS(data_train, data_test, target_train, target_test, knn)
end = time()
logger.info("SFS - Time elapsed: " + str(end-start) + ". Score: " + str(score) + ". Selected features: " + str(sum(selected_features)))
if __name__ == "__main__":
import multiprocessing as mp; mp.set_start_method('forkserver', force = True)
main()
这是SFS功能
def SFS(data_train, data_test, target_train, target_test, classifier):
rowsize = len(data_train[0])
selected_features = np.zeros(rowsize, dtype=np.bool)
best_score = 0
best_feature = 0
while best_feature is not None:
end = True
best_feature = None
for idx in range(rowsize):
if selected_features[idx]:
continue
selected_features[idx] = True
classifier.fit(data_train[:,selected_features], target_train)
score = classifier.score(data_test[:,selected_features], target_test)
selected_features[idx] = False
if score > best_score:
best_score = score
best_feature = idx
if best_feature is not None:
selected_features[best_feature] = True
return selected_features, best_score
我不明白n_jobs怎么样> 1比n_jobs = 1慢。任何人都可以解释一下吗?我试过3个数据集。
答案 0 :(得分:1)
我发现很多人像您一样遇到了同样的问题:n_jobs在sklearn的KNearestNeighbors中不起作用。他们还抱怨说只有1个CPU内核被装载。
在我的实验中,无论n_jobs> 1为否,拟合过程仅使用单个核。因此,无论您是否将n_jobs设置为大数,如果火车数据样本很大,训练时间将是巨大的,而且不会减少。
n_jobs> 1比n_jobs = 1还要慢的原因是因为要分配用于多处理的资源。