应用错误收集

我想比较uni-gram和bi-gram的频率，如果两者的频率相同，如果uni-gram的频率大于bi-gram的频率，我们用uni-gram替换uni-gram。 bi-gram我们计算两者的差异，并将这个差异分配给uni-gram，因为它的新频率到目前为止的代码如下所示，但我需要将我的逻辑转换成代码，请帮助解决这个问题。

if __name__ == '__main__':

    corpus = load_corpus()
    term_freqs = sents_chunks(corpus, PATTERN)
    word_check = word_count(corpus)

    with open('Ngrams+.txt', 'w') as f:
        new_cands = []
        key1 = []
        for c in sorted_candidates:
            newc = '%.5f\t%s' % (c[1], c[0])
            word = c[0].split(" ")
            for indv_word in word:
                for  key, value in word_check.items():
                    if  indv_word in key:
                        print(key,value)

如何比较unigrams和bigrams的频率

0 个答案: