我正试图用NLTK“计算”我的语料库中的双字母组合。但是,似乎我的脚本中仍然存在错误。我无法弄清楚我做错了什么,所以我希望有人能够给我至少一些线索。请记住,我对此很新。谢谢!
tekst.collocations()
bgm = nltk.collocations.BigramAssocMeasures()
finder = BigramCollocationFinder.from_words(mijn_corpus) # mijn_corpus should be it's loc
finder.apply_freq_filter(3) # filter out the ones that only appear 1,2 times
finder.nbest(bgm.pmi, 10)
scored_bgm = finder.score_ngrams( bgm.likelihood_ratio )
prefix_keys = collections.defaultdict(list)
for key, scores in scored: # sorting on first word of bigram
prefix_keys[key[0]].append((key[1], scores))
for key in prefix_keys: #strongest association
prefix_keys[key].sort(key = lambda x: -x[1])