我试图在python中创建一个程序,它应该从examens.txt输出那些发生超过3次的三元组的频率。单词和特殊字符的大写和小写将被忽略,输出应按频率排序。
我的老师告诉我,我只需换两行!但我得到了蟒蛇盲。对我来说,代码看起来是正确的,但它确实有效。
with open("examen.txt") as f:
data = f.read()
text = data.replace("\xad", "")
words = []
for word in data.lower().split():
word = word.strip("‚‘!,.:«»-()'_#-–„“*?")
if word != "":
if not word[-1].isalnum():
print(repr(word))
words.append(word)
trigrams = {}
for i in range(len(words)):
word = words[i]
nextword = words[i + 1]
nextnextword = words[i + 2]
key = (word, nextword, nextnextword)
trigrams[key] = trigrams.get(key, 0) + 1
l = list(trigrams.items())
l.sort(key=lambda x: (x[1], x[0]))
l.reverse()
for key, count in trigrams:
if count < 3:
break
word = key[0]
nextword = key[1]
nextnextword = key[2]
print(word, nextword, nextnextword, count)
答案 0 :(得分:0)
当您构建三元组并且在最后一个循环中不打印正确的数据结构时,您会过深地进入l
。
我只会更改两行 -
words