好的我正在使用不同的标记来标记文本。默认,unigram,bigram和trigram。
我必须检查这四个标记中的三个的哪个组合最准确。
要做到这一点,我必须遍历所有可能的组合,我喜欢这样:
permutaties = list(itertools.permutations(['default_tagger','unigram_tagger',
'bigram_tagger','trigram_tagger'],3))
resultaten = []
for element in permutaties:
resultaten.append(accuracy(element))
所以每个元素都是三个tag方法的元组,例如:('default_tagger', 'bigram_tagger', 'trigram_tagger')
在准确度函数中,我现在必须动态调用每个标记器的三个附带方法,问题是:我不知道该怎么做。
标记器功能如下:
unigram_tagger = nltk.UnigramTagger(brown_train, backoff=backofff)
bigram_tagger = nltk.BigramTagger(brown_train, backoff=backofff)
trigram_tagger = nltk.TrigramTagger(brown_train, backoff=backofff)
default_tagger = nltk.DefaultTagger('NN')
因此,对于示例,代码应该变为:
t0 = nltk.DefaultTagger('NN')
t1 = nltk.BigramTagger(brown_train, backoff=t0)
t2 = nltk.TrigramTagger(brown_train, backoff=t1)
t2.evaluate(brown_test)
所以本质上问题是如何遍历4个函数列表的所有24种组合。
任何可以帮助我的Python大师?
答案 0 :(得分:1)
如果我理解了你需要的东西,那就不是真的,但是你可以使用你想要自己调用的方法而不是字符串 - 你的代码可以像以下一样:
permutaties = itertools.permutations([nltk.UnigramTagger, nltk.BigramTagger, nltk.TrigramTagger, nltk.DefaultTagger],3)
resultaten = []
for element in permutaties:
resultaten.append(accuracy(element, brown_Train, brown_element))
def accuracy(element, brown_train,brown_element):
if element is nltk.DeafultTagger:
evaluator = element("NN")
else:
evaluator = element(brown_train, backoff=XXX) #maybe insert more elif
#clauses to retrieve the proper backoff parameter --or you could
# usr a tuple in the call to permutations so the apropriate backoff
#is avaliable for each function to be called
return evaluator.evaluate(brown_test) # ? I am not shure from your code if this is your intent
答案 1 :(得分:0)
从jsbueno的代码开始,我建议为每个标记器编写一个包装函数,为它们提供相同的签名。因为你只需要一次,我建议使用lambda。
permutaties = itertools.permutations([lambda: ntlk.DefaultTagger("NN"),
lambda: nltk.UnigramTagger(brown_train, backoff),
lambda: nltk.BigramTagger(brown_train, backoff),
lambda: nltk.TrigramTagger(brown_train, backoff)],3)
这将允许您直接呼叫每个人,而无需使用特殊功能来确定您正在呼叫哪个功能并使用适当的签名。
答案 2 :(得分:0)
基于jsbueno代码我认为你想重用评估器作为后退参数,所以代码应该是
permutaties = itertools.permutations([nltk.UnigramTagger, nltk.BigramTagger, nltk.TrigramTagger, nltk.DefaultTagger],3)
resultaten = []
for element in permutaties:
resultaten.append(accuracy(element, brown_Train, brown_element))
def accuracy(element, brown_train,brown_element):
evaluator = "NN"
for e in element:
if evaluator == "NN":
evaluator = e("NN")
else:
evaluator = e(brown_train, backoff=evaluator) #maybe insert more elif
#clauses to retrieve the proper backoff parameter --or you could
# usr a tuple in the call to permutations so the apropriate backoff
#is avaliable for each function to be called
return evaluator.evaluate(brown_test) # ? I am not shure from your code if this is your intent