cross_val_predict未完成。没有错误消息

时间:2020-06-24 11:40:23

标签: python numpy scikit-learn knn

我正在尝试在MNIST示例数据集上实现KNearestNeighbors的使用。

当尝试使用cross_val_predict时,无论我将其保留多长时间,脚本都将继续运行。

有什么我想念的地方吗?

感谢任何反馈。

from sklearn.datasets import fetch_openml
import numpy as np
mnist = fetch_openml('mnist_784', version=1) #Imports the dataset into the notebook

X, y = mnist["data"], mnist["target"]

y=y.astype(np.uint8)
X=X.astype(np.uint8)#For machine learning models to understand the output must be casted to an interger not a string.

X.shape, y.shape

y=y.astype(np.uint8) #For machine learning models to understand the output must be casted to an interger not a string.
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:] #Separate the data into training and testing sets

from sklearn.neighbors import KNeighborsClassifier

knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_train)

from sklearn.model_selection import cross_val_predict
from sklearn.metrics import f1_score

y_train_knn_pred = cross_val_predict(knn_clf, X_train, y_train, cv=3)

f1_score(y_train, y_train_knn_pred, average="macro")

2 个答案:

答案 0 :(得分:1)

使用n_jobs=-1

用于进行计算的CPU数量。除非没有,否则没有1 在joblib.parallel_backend上下文中。 -1表示使用所有处理器

from sklearn.datasets import fetch_openml
import numpy as np
mnist = fetch_openml('mnist_784', version=1) #Imports the dataset into the notebook

X, y = mnist["data"], mnist["target"]
y=y.astype(np.uint8)
X=X.astype(np.uint8)#For machine learning models to understand the output must be casted to an interger not a string.


y=y.astype(np.uint8) #For machine learning models to understand the output must be casted to an interger not a string.
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:] #Separate the data into training and testing sets

from sklearn.neighbors import KNeighborsClassifier

knn_clf = KNeighborsClassifier(n_jobs=-1) # HERE
knn_clf.fit(X_train, y_train) # this took seconds on my macbook pro

from sklearn.model_selection import cross_val_predict
from sklearn.metrics import f1_score

y_train_knn_pred = cross_val_predict(knn_clf, X_train, y_train, cv=3, n_jobs=-1) # AND HERE

f1_score(y_train, y_train_knn_pred, average="macro")

答案 1 :(得分:1)

我认为混淆的原因是,KNN算法的拟合调用比预测快得多。从另一个SO帖子:

Why is cross_val_predict so much slower than fit for KNeighborsClassifier?

KNN也称为惰性算法,因为在拟合过程中 除了保存输入数据,什么也没有,特别是在 全部。

在预测期间,实际发生的距离是每个 测试数据点。因此,您可以了解使用 cross_val_predict,KNN必须在验证数据点上进行预测, 这会增加计算时间!

因此,当您查看输入大小时,需要大量的计算能力。数据。使用多个cpus或最小化尺寸可能很有用。

如果要使用多个CPU内核,则可以将参数“ n_jobs”传递给cross_val_predictKNeighborsClassifier,以设置要使用的内核数量。将其设置为-1以使用所有可用的内核