这是我的代码:
import matplotlib.pyplot as plt,matplotlib.colors as clr
import pandas as pd,csv,numpy as np
from sklearn import linear_model
from sklearn.model_selection import ShuffleSplit as ss, learning_curve as
lc,StratifiedKFold as skf
from sklearn.utils import shuffle
file=open('C:\\Users\\Anil Satya\\Desktop\\Internship_projects\\BD
Influenza\\BD_Influenza_revised_imputed.csv','r+')
flu_data=pd.read_csv(file)
flu_num=flu_data.ix[:,5:13]
features=np.array(flu_num.ix[:,0:7])
label=np.array(flu_num.ix[:,7])
splt=skf(n_splits=2,shuffle=True,random_state=None)
clf=linear_model.LogisticRegression()
model=clf.fit(features,label)
def classifier(clf,x,y):
accuracy=clf.score(x,y)
return accuracy
lc(estimator=clf,X=features,y=label,train_sizes=0.75,cv=splt,
scoring=classifier(clf,features,label))
执行时,它显示以下错误:
Traceback (most recent call last):
File "C:/Ankur/Python36/Python Files/BD_influenza_learningcurve.py", line
26, in <module>
lc(estimator=clf,X=features,y=label,train_sizes=0.75,cv=splt,
scoring=classifier(clf,features,label))
File "C:\Ankur\Python36\lib\site-
packages\sklearn\model_selection\_validation.py", line 756, in
learning_curve
n_max_training_samples)
File "C:\Ankur\Python36\lib\site-
packages\sklearn\model_selection\_validation.py", line 808, in
_translate_train_sizes
n_ticks = train_sizes_abs.shape[0]
IndexError: **tuple index out of range**
我无法确定问题所在。但是,我认为问题在于学习曲线功能,因为我已经执行了没有它的程序,并且它工作正常。
答案 0 :(得分:0)
得分或train_sizes参数会导致问题。
尝试替换:
lc(estimator=clf,X=features,y=label,train_sizes=0.75,cv=splt,
scoring=classifier(clf,features,label))
带
1)
lc(estimator=clf,X=features,y=label,train_sizes=0.75,cv=splt,
scoring="accuracy")
或2)
import numpy as np
lc(estimator=clf,X=features,y=label,train_sizes=np.array([0.75]),cv=splt,
scoring="accuracy")
最后,对于评分参数,您可以在此处看到可以使用的可用属性/字符串:The scoring parameter