三卦计划

时间:2017-06-19 09:33:42

标签: python python-3.x

我试图在python中创建一个程序,它应该从examens.txt输出那些发生超过3次的三元组的频率。单词和特殊字符的大写和小写将被忽略,输出应按频率排序。

我的老师告诉我,我只需换两行!但我得到了蟒蛇盲。对我来说,代码看起来是正确的,但它确实有效。

with open("examen.txt") as f:
    data = f.read()
    text = data.replace("\xad", "")

words = []
for word in data.lower().split():
    word = word.strip("‚‘!,.:«»-()'_#-–„“*?")
    if word != "":
        if not word[-1].isalnum():
            print(repr(word))
        words.append(word)

trigrams = {}
for i in range(len(words)):
    word = words[i]
    nextword = words[i + 1]
    nextnextword = words[i + 2]
    key = (word, nextword, nextnextword)
    trigrams[key] = trigrams.get(key, 0) + 1

l = list(trigrams.items())
l.sort(key=lambda x: (x[1], x[0]))
l.reverse()
for key, count in trigrams:
    if count < 3:
        break
    word = key[0]
    nextword = key[1]
    nextnextword = key[2]
    print(word, nextword, nextnextword, count)

1 个答案:

答案 0 :(得分:0)

当您构建三元组并且在最后一个循环中不打印正确的数据结构时,您会过深地进入l

我只会更改两行 -

words