如何使用我自己的数据集而非随机数据进行Perfom Kolmogorov-Sminorv检验或Anderson Darling检验?

时间:2018-11-22 10:55:27

标签: python-3.x logistic-regression

#DATA TRAINING
x=dataset.drop('Flag',axis = 1)
y = dataset['Flag']
dataset.hist(figsize = (20,15))
plt.savefig("hr_histogram_plots")
plt.show()

from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test =train_test_split(x, y, test_size=0.3)

from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression()
logmodel.fit(x_train, y_train)
predictions = logmodel.predict(x_test)

sns.heatmap(dataset.corr())

#classificationreport-precision.recall
from sklearn.metrics import classification_report
classification_report = classification_report(y_test, predictions)
print(classification_report)

#confusion matrix
from sklearn.metrics import confusion_matrix, roc_curve, roc_auc_score
confusion_matrix = confusion_matrix(y_test, predictions)
print('The Confision Matrix Is: ',confusion_matrix)

#plotting the Correlation matrix
plt.matshow(confusion_matrix)
plt.title('Confusion matrix')
plt.colorbar()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()

#accuracy score
from sklearn.metrics import accuracy_score
accuracy_score = accuracy_score(y_test, predictions)
print('Accuracy Score Is: ',accuracy_score)

#ROC Curve
##Computing false and true positive rates
fpr, tpr,_=roc_curve(logmodel.predict(x), y,drop_intermediate=False)

plt.figure()
##Adding the ROC
plt.plot(fpr, tpr, color='red',lw=2, label='ROC CURVE')
##Random FPR and TPR
plt.plot([0, 1], [0, 1], color='blue', lw=2, linestyle='--')
##Title and label
plt.xlabel('FPR')
plt.ylabel('TPR')
plt.title('ROC Curve', fontsize = 14)
plt.show()

#AUC computation
auc=roc_auc_score(logmodel.predict(x), y)
print('AUC = ', auc)

上面是我的Logistic回归代码,它具有分类或二进制形式的数据集。在使用ks测试或广告测试如何测试正常性方面,我需要帮助吗?我的数据集有11个自变量和1个因变量。

0 个答案:

没有答案