TL; DR

Question

如果我有中文单词列表：喜欢参考= [＆＃39;我＆＃39;，＆＃39;是＆＃39;，＆＃39;好＆＃39; ，假设= [＆＃39;我＆＃39;，＆＃39;是＆＃39;，＆＃39;善良的＆＃39;，＆＃39;人]。我可以使用：nltk.translate.bleu_score.sentence_bleu（引用，假设）进行中文翻译吗？它和英语一样吗？日本人怎么样？我的意思是如果我有英文单词列表（中文和日文）。谢谢！

Answer 1

TL; DR

是

在长

BLEU分数衡量n-gram及其对语言的不可知性，但它取决于语言句子可以分成标记的事实。所以是的，它可以比较中国/日本......

请注意在句子级别使用BLEU分数的注意事项。 BLEU从未创建过句子级别比较，这里讨论很好：https://github.com/nltk/nltk/issues/1838

最有可能的是，当你的句子很短时，你会看到警告，例如：

>>> from nltk.translate import bleu
>>> ref = '我 是 好 人'.split()
>>> hyp = '我 是 善良的 人'.split()
>>> bleu([ref], hyp)
/usr/local/lib/python2.7/site-packages/nltk/translate/bleu_score.py:490: UserWarning: 
Corpus/Sentence contains 0 counts of 3-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().
  warnings.warn(_msg)
0.7071067811865475

您可以使用https://github.com/alvations/nltk/blob/develop/nltk/translate/bleu_score.py#L425中的平滑功能来克服短句。

>>> from nltk.translate.bleu_score import SmoothingFunction
>>> smoothie = SmoothingFunction().method4
>>> bleu([ref], hyp, smoothing_function=smoothie)
0.2866227639866161

BLEU分数：我可以使用nltk.translate.bleu_score.sentence_bleu来计算中文的蓝色分数

1 个答案:

TL; DR

在长