Question

我在大约5000个英语句子/段落上训练了KENLM语言模型。我想用两个或更多段来查询这个ARPA模型，看看它们是否可以连接成一个更长的句子，希望更“语法化”。下面是我用来获取段的对数分数和基于十的幂值的Python代码和“句子”。我举了两个例子。显然，第一个例子中的句子比第二个例子中的句子更具语法性。然而，我的问题不是关于这个，而是关于如何将整个句子的语言模型得分与句子成分的语言模型得分联系起来。也就是说，如果句子在语法上比其成分更好。

import math
import kenlm as kl
model = kl.LanguageModel(r'D:\seg.arpa.bin')
print ('************')
sentence = 'Mr . Yamada was elected Chairperson of'
print(sentence)
p1=model.score(sentence)
p2=math.pow(10,p1)
print(p1)
print(p2)
sentence = 'the Drafting Committee by acclamation .'
print(sentence)
p3=model.score(sentence)
p4=math.pow(10,p3)
print(p3)
print(p4)
sentence = 'Mr . Yamada was elected Chairperson of the Drafting Committee by acclamation .'
print(sentence)
p5=model.score(sentence)
p6=math.pow(10,p5)
print(p5)
print(p6)
print ('-------------')
sentence = 'Cases cited in the present volume ix'
print(sentence)
p1=model.score(sentence)
p2=math.pow(10,p1)
print(p1)
print(p2)
sentence = 'Multilateral instruments cited in the present volume xiii'
print(sentence)
p3=model.score(sentence)
p4=math.pow(10,p3)
print(p3)
print(p4)
sentence = 'Cases cited in the present volume ix Multilateral instruments cited in the present volume xiii'
print(sentence)
p5=model.score(sentence)
p6=math.pow(10,p5)
print(p5)
print(p6)

************先生。山田当选为主席 -34.0706558228 8.49853715087e-35起草委员会以鼓掌方式。 -28.3745193481 4.22163470933e-29先生。山田以鼓掌方式当选为起草委员会主席。 -55.5128440857 3.07012398337e-56 -------------本卷ix中引用的案例 -27.7353248596 1.83939558773e-28本卷所引用的多边文书xiii -34.4523620605 3.52888852435e-35本卷所引用的案例ix本卷所引用的多边文书xiii -60.7075233459 1.9609957573e-61

Answer 1

使用

列表（model.full_scores（发送））

返回句子成分的细节，即单词。

这将返回一个列表并对其进行迭代以访问每个单词的详细信息。每个列表项都包含

以上返回log-probability，ngram-length以及是否为单词是句子中每个单词的OOV（词汇外）。

如何将整个句子的语言模型得分与句子成分的得分相关联

1 个答案: