我已计算X_train, X_test, y_train, y_test
。但我无法计算y_train_true, y_train_prob, y_test_true, y_test_prob
。
如何从以下代码计算y_train_true, y_train_prob, y_test_true, y_test_prob
?
N.B ,
y_train_true:训练数据集中的真实二进制标签0或1
y_train_prob:模型为训练数据集预测的范围{0,1}中的概率
y_test_true:测试数据集中的真实二进制标签0或1
y_test_prob:模型为测试数据集预测的范围{0,1}中的概率
代码:
# Split test and train data
import numpy as np
from sklearn.model_selection import train_test_split
X = np.array(dataset.ix[:, 1:10])
y = np.array(dataset['benign_malignant'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
#Define Classifier and ====
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
# knn = KNeighborsClassifier(n_neighbors=11)
knn.fit(X_train, y_train)
# Predicting the Test set results
y_pred = knn.predict(X_train)
答案 0 :(得分:1)
好的情况y_train
和y_test
已经y_train_true
和y_test_true
。要获得y_train_prob
和y_test_prob
,您需要采用模型。我不知道你正在使用哪个数据集,但它似乎是一个二元分类问题,所以你可以使用逻辑回归来做到这一点,
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
knn.fit(X_train, y_train)
y_train_prob = knn.predict_proba(X_train)
y_test_prob = knn.predict_proba(X_test)