Question

我正在尝试根据一天中的时间预测事件IS_Slight是否发生的可能性。但是，我感到自己在某个地方迈出了错误的一步，因为我在MultinomialNB和LogisticRegression中的混淆矩阵正在产生一些奇怪的结果，即只有误报和真报。我觉得应该有一些真否定和假否定。低的roc_auc_score也让我失望了，应该不高吗？我知道需要仔细研究很多，但谢谢您的任何建议。

#if word slight found then set to 1 else set to 0
df['IS_SLIGHT'] = df['Accident_Severity'].apply(lambda x: 1 if 'Slight' in x else 0)

cvec = CountVectorizer()
X = df.Time
y = df.IS_SLIGHT
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=1)

cvec = CountVectorizer()
cvec.fit(X_train)
RT_train = pd.DataFrame(cvec.transform(X_train).todense(),columns=cvec.get_feature_names())
RT_test = pd.DataFrame(cvec.transform(X_test).todense(),columns=cvec.get_feature_names())

print(RT_train)

出局：

      afternoon   evening   morning   night  
 --- ----------- --------- --------- ------- 
  0           0         1         0       0  
  1           0         1         0       0  
  2           0         0         1       0

...

print(RT_train.shape)

出局：（1390029，4）

print(RT_test.shape)

出局：（463344，4）

print(y_train.shape)

出局：（1390029，）

print(y_test.shape)

出局：（463344，）

lr = LogisticRegression()
lr.fit(RT_train,y_train)

lrypred = lr.predict(RT_test)
print(metrics.accuracy_score(y_test, lrypred))

出局：0.8502969715805104

print(metrics.confusion_matrix(y_test, lrypred))

出： [[0 69364] [0 393980]]

nb = MultinomialNB()

nb.fit(RT_train,y_train)
ypred = nb.predict(RT_test)

print(metrics.accuracy_score(y_test,ypred))

出局：0.8502969715805104

print(metrics.confusion_matrix(y_test,ypred))

出局：[[0 69364] [0 393980]]

y_pred_prob = nb.predict_proba(RT_test)[:,1]
lry_pred_prob = lr.predict_proba(RT_test)[:, 1]
print(y_pred_prob)

输出：[0.85453109 0.85453109 0.78504675 ... 0.85453109 0.78504675 0.86917669]

print(metrics.roc_auc_score(y_test,y_pred_prob))

出场：0.5436548840102361

print(metrics.roc_auc_score(y_test,lry_pred_prob))

出场：0.5436548840102361

print(classification_report(y_test, lrypred))

出局：

                 precision   recall     f1-score      support  
 -------------- ----------- ---------- ------------ ---------- 
  0                   0.00       0.00         0.00      69364  
  1                   0.85       1.00         0.92     393980  
  micro avg           0.85       0.85         0.85     463344  
  macro avg           0.43       0.50         0.46     463344  
  weighted avg        0.72       0.85         0.78     463344

Sklearn Logistic回归预测模型问题

0 个答案: