当我运行此交叉验证时,将其遗漏了,它什么也不做,甚至没有错误消息。我不知道我想念的是什么。我正在使用kaggle的csv-https://www.kaggle.com/dileep070/heart-disease-prediction-using-logistic-regression/downloads/heart-disease-prediction-using-logistic-regression.zip/1
import csv
from sklearn.model_selection import LeaveOneOut
from sklearn import svm
from sklearn.impute import SimpleImputer
import numpy as np
import pandas as pd
from pandas import read_csv
from sklearn.model_selection import cross_val_score,
cross_val_predict
from sklearn import metrics
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
#replace missing values with mean
dataset=read_csv("//Users/crystalfortress/Desktop/CompGenetics
/Final_Project_Comp/framingham.csv")
dataset.fillna(dataset.mean(), inplace=True)
print(dataset.isnull().sum())
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 15].values
model = svm.SVC(kernel='linear', C=10, gamma = 0.1)
loo = LeaveOneOut()
scores = cross_val_score(model, X, y, cv=loo, scoring='accuracy')
print('Accuracy after cross validation:', scores.mean())
predictions = cross_val_predict(model, X, y, cv=loo)
accuracy = metrics.r2_score(y, predictions)
print('Prediction accuracy:', accuracy)
x = metrics.classification_report(y, predictions)
print(x)
cf = metrics.confusion_matrix(y, predictions)
print(cf)
答案 0 :(得分:0)
我尝试在我的机器上运行它,看起来您的代码工作正常(尽管我确实注释掉了很多不必要的导入)。离开模型训练将花费很长时间(从n
数据点link中提取n
个训练数据集)。因此,您必须等待它进行训练才能获得结果。将cv=
更改为数字(我相信默认值为3)会更快地训练模型,并且很可能模型中的方差较小。另外,将n_jobs=-1
添加到您的cross_val_score
调用中,将允许python访问所有处理器。例如:scores = cross_val_score(model, X, y, cv=loo, scoring='accuracy', n_jobs=-1)
您还可以在verbose
中设置cross_val_score
参数,以观察进度(尽管抬起头来并不很快)。我认为最高值是3(在他们的在线文档中没有提及),但是使用较高的值就可以了。因此,最终的cross_val_score调用如下所示:
scores = cross_val_score(model, X, y, cv=loo, scoring='accuracy', n_jobs=-1, verbose=10)