我正在尝试使用NLTK.NaiveBayesClassifier中的nltk.metrics.score(http://www.nltk.org/_modules/nltk/metrics/scores.html)来计算精度和召回率。
但是,我偶然发现了该错误:
"unsupported operand type(s) for +: 'int' and 'NoneType".
我怀疑这是我的10倍交叉验证得出的,其中某些参考集中的负值为零(数据集有些不平衡,其中87%为正)。
根据nltk.metrics.score,
def precision(reference, test):
"Given a set of reference values and a set of test values, return
the fraction of test values that appear in the reference set.
In particular, return card(``reference`` intersection
``test``)/card(``test``).
If ``test`` is empty, then return None."
由于我的10折组中的参考组中没有负数,因此似乎将召回返回为None。关于如何解决此问题的任何想法吗?
我的完整代码如下:
trainfeats = negfeats + posfeats
n = 10 # 5-fold cross-validation
subset_size = len(trainfeats) // n
accuracy = []
pos_precision = []
pos_recall = []
neg_precision = []
neg_recall = []
pos_fmeasure = []
neg_fmeasure = []
cv_count = 1
for i in range(n):
testing_this_round = trainfeats[i*subset_size:][:subset_size]
training_this_round = trainfeats[:i*subset_size] + trainfeats[(i+1)*subset_size:]
classifier = NaiveBayesClassifier.train(training_this_round)
refsets = collections.defaultdict(set)
testsets = collections.defaultdict(set)
for i, (feats, label) in enumerate(testing_this_round):
refsets[label].add(i)
observed = classifier.classify(feats)
testsets[observed].add(i)
cv_accuracy = nltk.classify.util.accuracy(classifier, testing_this_round)
cv_pos_precision = precision(refsets['Positive'], testsets['Positive'])
cv_pos_recall = recall(refsets['Positive'], testsets['Positive'])
cv_pos_fmeasure = f_measure(refsets['Positive'], testsets['Positive'])
cv_neg_precision = precision(refsets['Negative'], testsets['Negative'])
cv_neg_recall = recall(refsets['Negative'], testsets['Negative'])
cv_neg_fmeasure = f_measure(refsets['Negative'], testsets['Negative'])
accuracy.append(cv_accuracy)
pos_precision.append(cv_pos_precision)
pos_recall.append(cv_pos_recall)
neg_precision.append(cv_neg_precision)
neg_recall.append(cv_neg_recall)
pos_fmeasure.append(cv_pos_fmeasure)
neg_fmeasure.append(cv_neg_fmeasure)
cv_count += 1
print('---------------------------------------')
print('N-FOLD CROSS VALIDATION RESULT ' + '(' + 'Naive Bayes' + ')')
print('---------------------------------------')
print('accuracy:', sum(accuracy) / n)
print('precision', (sum(pos_precision)/n + sum(neg_precision)/n) / 2)
print('recall', (sum(pos_recall)/n + sum(neg_recall)/n) / 2)
print('f-measure', (sum(pos_fmeasure)/n + sum(neg_fmeasure)/n) / 2)
print('')
答案 0 :(得分:0)
也许不是最优雅的,但猜测最简单的解决方法是将其设置为0,如果不设置则为实际值,例如:
cv_pos_precision = 0
if precision(refsets['Positive'], testsets['Positive']):
cv_pos_precision = precision(refsets['Positive'], testsets['Positive'])
当然还有其他人。