我使用scikit-learn一起使用Kfold分层抽样和KNeighborsClassifier预测模型。
虚拟数据集是: 将pandas导入为pd 导入numpy为np
data = pd.DataFrame(
{'A' : [4,5,6,7,1,3,4,9,1,8], 'B' : [10,20,30,40,90,55,68,25,19,97],'C' : [100,50,30,89,54,23,13,67,93,84],'y' :[1,1,0,0,0,1,0,1,1,0]}).astype(np.float)
data1 = data.drop(['y'],axis = 1, inplace= False)
X = data1.as_matrix().astype(np.float)
X
y = data['y'].as_matrix().astype(np.int)
y
对于Kfold分层抽样,代码为:
from sklearn.cross_validation import StratifiedKFold
def stratifiedkfold_cv(X, y, clf_class, shuffle=True, n_folds=2, **kwargs):
stratifiedk_fold = StratifiedKFold(y, n_folds=n_folds, shuffle=shuffle)
y_pred = y.copy()
for train_index, test_index in stratifiedk_fold:
X_train, X_test = X[train_index], X[test_index]
y_train = y[train_index]
clf = clf_class(**kwargs)
clf.fit(X_train,y_train)
y_pred[test_index] = clf.predict(X_test)
return y_pred
我试图通过调整参数来调整最好的sklearn.neighbors,KNeighborsClassifier:n_neighbors基于accuracy_score。代码是
from sklearn.neighbors import KNeighborsClassifier
k_range = range(1,31)
k_scores = []
for k in k_range:
knn = KNeighborsClassifier
y_pred = stratifiedkfold_cv(X, y,knn(n_neighbors = k))
scores = accuracy_score(y, y_pred)
k_scores.append(scores.mean())
print(k_scores)
但我得到的错误是: ** ----> 7 y_pred = stratifiedkfold_cv(X,y,knn(n_neighbors = k)) ----> 7 clf = clf_class(** kwargs) ** TypeError:' KNeighborsClassifier'对象不可调用******
我相信我与定义为stratifiedkfold_cv的功能有些不一致。但是我无法弄清楚如何修改它?
答案 0 :(得分:1)
def accuracy(y_true,y_pred):
return np.mean(y_true == y_pred)
from sklearn.neighbors import KNeighborsClassifier
k_range = range(1,31)
k_scores = []
for k in k_range:
knn = KNeighborsClassifier
print accuracy(y, stratifiedkfold_cv(X,y,KNN,n_neighbors=k))