带有NLTK的Bigrams:脚本问题

时间:2013-05-02 10:49:25

标签: python nltk

我正试图用NLTK“计算”我的语料库中的双字母组合。但是,似乎我的脚本中仍然存在错误。我无法弄清楚我做错了什么,所以我希望有人能够给我至少一些线索。请记住,我对此很新。谢谢!

tekst.collocations()    
bgm = nltk.collocations.BigramAssocMeasures()
finder = BigramCollocationFinder.from_words(mijn_corpus) # mijn_corpus should be it's loc
finder.apply_freq_filter(3) # filter out the ones that only appear 1,2 times
finder.nbest(bgm.pmi, 10) 
scored_bgm = finder.score_ngrams( bgm.likelihood_ratio  )
prefix_keys = collections.defaultdict(list) 
for key, scores in scored: # sorting on first word of bigram
    prefix_keys[key[0]].append((key[1], scores))
for key in prefix_keys: #strongest association
    prefix_keys[key].sort(key = lambda x: -x[1])

0 个答案:

没有答案