从nltk.metrics.score中调用,返回None

时间:2018-08-06 09:48:05

标签: nltk

我正在尝试使用NLTK.NaiveBayesClassifier中的nltk.metrics.score(http://www.nltk.org/_modules/nltk/metrics/scores.html)来计算精度和召回率。

但是,我偶然发现了该错误:

"unsupported operand type(s) for +: 'int' and 'NoneType". 

我怀疑这是我的10倍交叉验证得出的,其中某些参考集中的负值为零(数据集有些不平衡,其中87%为正)。

根据nltk.metrics.score,

def precision(reference, test):
   "Given a set of reference values and a set of test values, return
   the fraction of test values that appear in the reference set.
   In particular, return card(``reference`` intersection 
   ``test``)/card(``test``).
   If ``test`` is empty, then return None."

由于我的10折组中的参考组中没有负数,因此似乎将召回返回为None。关于如何解决此问题的任何想法吗?

我的完整代码如下:

trainfeats = negfeats + posfeats    
n = 10 # 5-fold cross-validation    

subset_size = len(trainfeats) // n
accuracy = []
pos_precision = []
pos_recall = []
neg_precision = []
neg_recall = []
pos_fmeasure = []
neg_fmeasure = []
cv_count = 1
for i in range(n):        
    testing_this_round = trainfeats[i*subset_size:][:subset_size]
    training_this_round = trainfeats[:i*subset_size] +         trainfeats[(i+1)*subset_size:]
    classifier = NaiveBayesClassifier.train(training_this_round)

    refsets = collections.defaultdict(set)
    testsets = collections.defaultdict(set)
    for i, (feats, label) in enumerate(testing_this_round):
        refsets[label].add(i)
        observed = classifier.classify(feats)
        testsets[observed].add(i)

    cv_accuracy = nltk.classify.util.accuracy(classifier, testing_this_round)
    cv_pos_precision = precision(refsets['Positive'], testsets['Positive'])
    cv_pos_recall = recall(refsets['Positive'], testsets['Positive'])
    cv_pos_fmeasure = f_measure(refsets['Positive'], testsets['Positive'])
    cv_neg_precision = precision(refsets['Negative'], testsets['Negative'])
    cv_neg_recall = recall(refsets['Negative'], testsets['Negative'])
    cv_neg_fmeasure =  f_measure(refsets['Negative'], testsets['Negative'])

    accuracy.append(cv_accuracy)
    pos_precision.append(cv_pos_precision)
    pos_recall.append(cv_pos_recall)
    neg_precision.append(cv_neg_precision)
    neg_recall.append(cv_neg_recall)
    pos_fmeasure.append(cv_pos_fmeasure)
    neg_fmeasure.append(cv_neg_fmeasure)

    cv_count += 1

print('---------------------------------------')
print('N-FOLD CROSS VALIDATION RESULT ' + '(' + 'Naive Bayes' + ')')
print('---------------------------------------')
print('accuracy:', sum(accuracy) / n)
print('precision', (sum(pos_precision)/n + sum(neg_precision)/n) / 2)
print('recall', (sum(pos_recall)/n + sum(neg_recall)/n) / 2)
print('f-measure', (sum(pos_fmeasure)/n + sum(neg_fmeasure)/n) / 2)
print('')

1 个答案:

答案 0 :(得分:0)

也许不是最优雅的,但猜测最简单的解决方法是将其设置为0,如果不设置则为实际值,例如:

cv_pos_precision = 0
if precision(refsets['Positive'], testsets['Positive']):
    cv_pos_precision = precision(refsets['Positive'], testsets['Positive'])

当然还有其他人。