错误:在绘制学习曲线时,元组索引超出范围

时间:2017-07-10 08:27:42

标签: python-3.x machine-learning tuples logistic-regression

这是我的代码:

  import matplotlib.pyplot as plt,matplotlib.colors as clr

  import pandas as pd,csv,numpy as np
    from sklearn import linear_model

  from sklearn.model_selection import ShuffleSplit as ss, learning_curve as 
  lc,StratifiedKFold as skf

  from sklearn.utils import shuffle

  file=open('C:\\Users\\Anil Satya\\Desktop\\Internship_projects\\BD 
  Influenza\\BD_Influenza_revised_imputed.csv','r+')
  flu_data=pd.read_csv(file)

  flu_num=flu_data.ix[:,5:13]
  features=np.array(flu_num.ix[:,0:7])
  label=np.array(flu_num.ix[:,7])
  splt=skf(n_splits=2,shuffle=True,random_state=None)

  clf=linear_model.LogisticRegression()
  model=clf.fit(features,label)
  def classifier(clf,x,y):
    accuracy=clf.score(x,y)
    return accuracy

  lc(estimator=clf,X=features,y=label,train_sizes=0.75,cv=splt,
  scoring=classifier(clf,features,label))

执行时,它显示以下错误:

Traceback (most recent call last):
File "C:/Ankur/Python36/Python Files/BD_influenza_learningcurve.py", line 
26, in <module>

lc(estimator=clf,X=features,y=label,train_sizes=0.75,cv=splt,
scoring=classifier(clf,features,label))
File "C:\Ankur\Python36\lib\site-
packages\sklearn\model_selection\_validation.py", line 756, in 
learning_curve 
n_max_training_samples)

File "C:\Ankur\Python36\lib\site-
packages\sklearn\model_selection\_validation.py", line 808, in 
_translate_train_sizes

n_ticks = train_sizes_abs.shape[0]

IndexError: **tuple index out of range**

我无法确定问题所在。但是,我认为问题在于学习曲线功能,因为我已经执行了没有它的程序,并且它工作正常。

1 个答案:

答案 0 :(得分:0)

得分或train_sizes参数会导致问题。

尝试替换:

lc(estimator=clf,X=features,y=label,train_sizes=0.75,cv=splt,
   scoring=classifier(clf,features,label))

1)

lc(estimator=clf,X=features,y=label,train_sizes=0.75,cv=splt,
   scoring="accuracy")

或2)

import numpy as np

lc(estimator=clf,X=features,y=label,train_sizes=np.array([0.75]),cv=splt,
   scoring="accuracy")

最后,对于评分参数,您可以在此处看到可以使用的可用属性/字符串:The scoring parameter