UndefinedMetricWarning:调用定义不明确,在没有真实样本的标签中被设置为0.0

时间:2019-12-12 20:28:53

标签: python svm cross-validation k-fold

我正在为SVM分类器进行10倍交叉验证,并且每折叠一次都会打印出精度,f1得分,召回率和准确性。我的数据集是带有文本的推文列表,并且是否定义为带有仇恨言论(0或1)。下面是一个示例:

'[('Hurray, saving us $$$ in so many ways @potus @realDonaldTrump #LockThemUp #BuildTheWall #EndDACA #BoycottNFL #BoycottNike', 1), ("Why would young fighting age men be the vast majority of the ones escaping a war & not those who cannot fight like women, children, and the elderly?It's because the majority of the refugees are not actually refugees they are economic migrants trying to get into Europe...", 1)]'

我的脚本运行良好,直到第8折,此时出现以下警告:

UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use zero_division parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result))

我不希望我的成绩不正确。我试图在没有任何运气的情况下使用此问题的答案:UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples

我的代码是这样的:

kf = KFold(n_splits=10)
tweets = shuffle(tweets_all)
kf.get_n_splits(tweets_all)
accuracy_total = 0.0
fold = 0
for train_index, test_index in kf.split(tweets_all):
  train_set_fold=[]
   test_set_fold=[]
  for i,instance in enumerate(tweets_all):
    if i in train_index:
      train_set_fold.append(instance)
    else:
      test_set_fold.append(instance)
  vocabulary_fold=get_vocabulary(train_set_fold, 500)
  svm_clf_fold=train_svm_classifier(train_set_fold, vocabulary_fold)
  X_test_fold=[]
  Y_test_fold=[]
  for instance in test_set_fold:
    vector_instance=get_vector_text(vocabulary_fold,instance[0])
    X_test_fold.append(vector_instance)
    Y_test_fold.append(instance[1])
  Y_test_fold_gold=np.asarray(Y_test_fold)
  X_test_fold=np.asarray(X_test_fold)
  Y_test_predictions_fold=svm_clf_fold.predict(X_test_fold)
  accuracy_fold=accuracy_score(Y_test_fold_gold, Y_test_predictions_fold)
  accuracy_total+=accuracy_fold
  fold += 1

  precision = precision_score(Y_test_fold_gold, Y_test_predictions_fold, average='macro')
  recall = recall_score(Y_test_fold_gold, Y_test_predictions_fold, average='macro', 
  labels=np.unique(Y_test_predictions_fold))
  f1 = f1_score(Y_test_fold_gold, Y_test_predictions_fold, average='macro')
  accuracy = accuracy_score(Y_test_fold_gold, Y_test_predictions_fold)

  print("Fold " + str(fold) + ": " + "Precision: " + str(round(precision, 3)))
  print("Fold " + str(fold) + ": " + "Recall: " + str(round(recall, 3)))
  print("Fold " + str(fold) + ": " + "F1-Score: " + str(round(f1, 3)))
  print("Fold " + str(fold) + ": " + "Accuracy: " + str(round(accuracy, 3)))
  print ("Fold completed.")
average_accuracy=accuracy_total/10
print ("\nAverage Accuracy: "+str(round(accuracy_fold,3)))

谁能解释为什么会这样,以及我将如何获得正确的结果。

0 个答案:

没有答案