当我使用MASI作为距离函数计算NLTK中的协议时,我得到了Krippendorff alpha的非常低的值。
指示三个程序员(Inky,Blinky和Sue)根据文本的内容将主题标签(爱情,礼物,粘液或游戏)分配给两个文本(text01和text02)。每个文本可以是多个主题,因此编码器可以为每个文本分配多个标签。用于制作计算器的数据和代码如下所示:
import nltk
from nltk.metrics import agreement
from nltk.metrics.distance import masi_distance
from nltk.metrics.distance import jaccard_distance
#(coder, item, label)
data = [('inky','text01',frozenset(['love','gifts'])),
('blinky','text01',frozenset(['love','gifts'])),
('sue','text01',frozenset(['love','gifts'])),
('inky','text02',frozenset(['slime','gaming'])),
('blinky','text02',frozenset(['slime'])),
('sue','text02',frozenset(['slime','gaming']))]
jaccard_task = nltk.AnnotationTask(distance=jaccard_distance)
masi_task = nltk.AnnotationTask(distance=masi_distance)
tasks = [jaccard_task, masi_task]
for task in tasks:
task.load_array(data)
print("Statistics for dataset using {}".format(task.distance))
print("C: {}\nI: {}\nK: {}".format(task.C, task.I, task.K))
print("Pi: {}".format(task.pi()))
print("Kappa: {}".format(task.kappa()))
print("Multi-Kappa: {}".format(task.multi_kappa()))
print("Alpha: {}".format(task.alpha()))
print()
当我运行代码时,我得到以下结果:
Statistics for dataset using <function jaccard_distance at 0x09D26DB0>
C: {'inky', 'sue', 'blinky'}
I: {'text01', 'text02'}
K: {frozenset({'slime'}), frozenset({'love', 'gifts'}), frozenset ({'gaming', 'slime'})}
Pi: 0.7272727272727273
Kappa: 0.7777777777777777
Multi-Kappa: 0.7499999999999999
Alpha: 0.75
Statistics for dataset using <function masi_distance at 0x09D26DF8>
C: {'inky', 'sue', 'blinky'}
I: {'text01', 'text02'}
K: {frozenset({'slime'}), frozenset({'love', 'gifts'}), frozenset({'gaming', 'slime'})}
Pi: 0.8172727272727272
Kappa: 0.8511111111111113
Multi-Kappa: 0.8324999999999998
Alpha: -1.5
我的问题是,为什么使用MASI距离函数时的alpha值与Jaccard相比如此之低?
答案 0 :(得分:0)
运行提供的代码时,我无法重现该错误,并获得了正确的Krippendorff alpha和MASI距离值。我使用了Python 3.5.2,NumPy 1.18.2,NLTK 3.4.5。因此,最可能的答案是需要更新NLTK。