我正在使用Logistic回归来绘制ROC。我正在使用此代码提取数据。
Diabetes=pd.read_csv('datasource/ScoringDatasheet.csv', sep=';')
然后我像这样使用iloc
。
inputData=Diabetes.iloc[:,:60]
outputData=Diabetes.iloc[:,60]
然后我正在使用logistical regression
分析数据并绘制ROC
from sklearn.linear_model import LogisticRegression
logit1=LogisticRegression()
logit1.fit(inputData,outputData)
logit1.score(inputData,outputData)
np.mean(logit1.predict(inputData)==outputData)
trueInput=Diabetes.ix[Diabetes['Outcome']==1].iloc[:,:62]
trueOutput=Diabetes.ix[Diabetes['Outcome']==1].iloc[:,62]
np.mean(logit1.predict(trueInput)==trueOutput)
falseInput=Diabetes.ix[Diabetes['Outcome']==0].iloc[:,:62]
falseOutput=Diabetes.ix[Diabetes['Outcome']==0].iloc[:,62]
np.mean(logit1.predict(falseInput)==falseOutput)
from sklearn.metrics import confusion_matrix, roc_curve, roc_auc_score
confusion_matrix(logit1.predict(inputData),outputData)
fpr, tpr,_=roc_curve(logit1.predict(inputData),outputData,drop_intermediate=False)
import matplotlib.pyplot as plt
plt.figure()
plt.plot(fpr, tpr, color='red', lw=2, label='ROC curve')
plt.plot([0, 1], [0, 1], color='blue', lw=2, linestyle='--')
plt.xlabel('False Positive ')
plt.ylabel('True Positive ')
plt.title('ROC curve')
plt.show()
roc_auc_score(logit1.predict(inputData),outputData)
coef_DF=pd.DataFrame(data={'Variable':list(inputData),
'value':(logit1.coef_[0])})
coef_DF_standardised=pd.DataFrame(data={'Variable':list(inputData),
'value':(logit1.coef_[0])*np.std(inputData,axis=0)/np.std(outputData)})
import matplotlib.pyplot as plt
plt.figure()
plt.scatter(inputData.iloc[:,1],inputData.iloc[:,5],c=logit1.predict_proba(inputData)[:,1],alpha=0.4)
plt.xlabel('Glucose level ')
plt.ylabel('BMI ')
plt.show()
plt.figure()
plt.scatter(inputData.iloc[:,1],inputData.iloc[:,5],c=outputData,alpha=0.4)
plt.xlabel('Glucose level ')
plt.ylabel('BMI ')
plt.show()
但是当我运行代码时,出现以下错误:
Traceback (most recent call last): File "index.py", line 13, in <module> logit1.fit(inputData,outputData) File "C:\Users\kulkaa\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\linear_model\logistic.py", line 1221, in fit check_classification_targets(y) File "C:\Users\kulkaa\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\multiclass.py", line 171, in check_classification_targets raise ValueError("Unknown label type: %r" % y_type) ValueError: Unknown label type: 'continuous'
根据this link,如果我使用分类器,则应将floats
转换为categorical values
。但是我在这里使用回归,如何解决该错误?
我正在使用的部分数据集如下:
Pat_ID Demo1 Demo2 Demo3 Demo4 Demo5 Demo6 DisHis1 DisHis1Times DisHis2 DisHis2Times DisHis3 DisHis3Times DisHis4 DisHis5 DisHis6 DisHis7 DisStage1 DisStage2 LungFun1 LungFun2 LungFun3 LungFun4 LungFun5 LungFun6 LungFun7 LungFun8 LungFun9 LungFun10 LungFun11 LungFun12 LungFun13 LungFun14 LungFun15 LungFun16 LungFun17 LungFun18 LungFun19 LungFun20 Dis1 Dis1Treat Dis2 Dis2Times Dis3 Dis3Times Dis4 Dis4Treat Dis5 Dis5Treat Dis6 Dis6Treat Dis7 RespQues1 ResQues1a ResQues1b ResQues1c ResQues2a SmokHis1 SmokHis2 SmokHis3 SmokHis4
6 0 0.430159833 0.596541787 0.323296661 0 0.867768595 0 0 0 0 0 0 0 0 0 0 0.8 0.714285714 0.447443182 0.280725319 0.392405063 0.315347722 0.442765731 0.35344 0.306497788 0.078249895 0.230895645 0 0.175430575 0.776595745 0.194322248 0.123935854 0.792696843 0.873987854 0.803933254 0.528064786 1 0.1 0 0 0 0 0.333333333 0.15 0 0 0 0 0.333333333 1 0 0.273565574 0.1074 0.7282 0.0469 0.3 0.082352941 0.085237258 0.724137931 0.145833333
9 0 0.218902015 0.484149856 0.177957923 0 0.225895317 0 0 0 0 0 0 0 0 0 0 0.6 0.142857143 0.899147727 0.441235729 0.620253165 0.708333333 0.69303235 0.55904 0.532922703 0.263357173 0.718707204 0.729159016 0.65096784 0.64893617 0.385594463 0.234804989 0.613921643 0.409665992 0.483313468 0.115610165 0 0.5 0 0 0 0 1 0 1 0 1 0 0.333333333 1 0 0.456557377 0.1791 0.7896 0.3212 0.2 0.176470588 0.144991213 0.620689655 0
15 0 0.628908965 0.433717579 0.594093804 1 0.363636364 0 0 0 0 0 0 0 0 0 0 0 0.142857143 0.970170455 0.396910678 0.746835443 0.575239808 0.478848205 0.36944 0.565368266 0.309002945 0.569433032 0.463643041 0.425392471 0.787234043 0.427004516 0.290833498 0.652339293 0.484311741 0.511323004 0.138788048 0 0.6 0 0 0 0 1 0 0 0 0 0 0.333333333 1 0 0.396413934 0.2596 0.8032 0.1836 0.2 0.058823529 0.052724077 0.637931034 0.0625
25 1 0.236275191 0.268011527 0.280254777 0 0.388429752 0 0 0 0 0 0 0 0 0 0 0.6 0 0.721590909 0.39758227 0.53164557 0.363309353 0.394063278 0.31088 0.224863364 0.096339924 0.321007943 0.351817848 0.361377839 0.521276596 0.213208986 0.059196199 0.728413846 0.497975709 0.62932062 0.147165596 1 0.6 0 0 1 0 0.333333333 0.05 0 0 0 0 0 0 0 0 0 0 0 0.1 0.176470588 0.118629174 0.517241379 0.104166667
27 1 0.397498263 0.327089337 0.425786528 0 0.063360882 0 0 0 0 0 0 0 0 0 0 0 0 0.950284091 0.358629953 0.82278481 0.580035971 0.462851049 0.33696 0.40426824 0.508834666 0.594631608 0.491737055 0.431489102 0.819148936 0.372514517 0.373589388 0.623430962 0.422823887 0.489272944 0.114493158 0 0.9 0 0 0.333333333 0.020833333 0.333333333 0.05 0 0 0 0 0 0 0 0.058709016 0 0.1847 0 0 0.176470588 0.087873462 0.396551724 0.0625
28 1 0.510771369 0.452449568 0.468249373 0 0.027548209 0 0 0 0 0 0 1 0 0 0 0 0.142857143 0.928977273 0.392209537 0.746835443 0.648081535 0.547813722 0.4232 0.46777132 0.379259571 0.675431389 0.581894969 0.502362445 0.79787234 0.351398909 0.388437933 0.597565614 0.441548583 0.472586412 0.122591455 0 0.9 0 0 0 0 0 0 0 0 0 0 0 0 1 0.480840164 0.5239 0.5354 0.4146 0.1 0.411764706 0.156414763 0.272413793 0
36 1 0.385684503 0.341498559 0.405134144 0 0.195592287 0 0 0 0 0 0 0 0 0 0 0.6 0.142857143 0.737215909 0.36937542 0.594936709 0.43735012 0.455563455 0.33952 0.259651254 0.165124106 0.447274719 0.432611091 0.384545039 0.691489362 0.212387823 0.159176401 0.647014074 0.504807692 0.511918951 0.148841106 0 0.8 1 0 0.333333333 0.041666667 0.333333333 0.1 0.333333333 0 1 0 1 0 1 0.453790984 0.5014 0.5946 0.3379 0.2 0.117647059 0.077768014 0.515517241 0
答案 0 :(得分:1)
sklearn.linear_model.LogisticRegression
是根据http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html的分类器(不是回归器)。