Question

我是这个论坛和Python的新手，我的一个程序有问题，更具体地说，我可以使用一些帮助来开始我的程序。我希望我的程序要做的是查看一个单词在文本中出现的频率，如果一个单词在接下来的30个单词中出现多次，我想标记该单词。一个例子：

您好我的名字是Mark和* Hello *我喜欢* Mark *

我已设法打开，更改为小写，拆分“，”和“\ n”之类的内容，读取并打印文本文件，但是我从这里开始有问题。我应该在班级中使用功能还是有其他方式？非常感谢编码的一些帮助，谢谢你提前。

def open_file(file2, mode):
"""Checks if file exists, if it does, it opens and reads the textfile."""
try:
    file = open(file2, "r")
    file1 = file.read().lower().replace('.','').replace('\n', '')
    print(file1)
except(IOError) as e:
    print("Could not find the file", file2, "\n")

else:
    split = file1.split(" ")
    return split

主 Filename = input（“输入文件名：”）+“。txt”

open_file(Filename, "r")

Answer 1

你应该拆分行来提取单词（例如split（“”）），然后将单词放在字典中

使用单词作为键并将计数器作为值，计算出现的次数：

dict = {}
...
if my_word in dict:
  dict[my_word] += 1
else:
  dict[my_word] = 1

然后你必须突出显示在词典中的计数高的单词（你可以在之前添加＆lt; i＆gt;然后在＆lt; / i＆gt;之后添加，如果你将文件读作html，则会将它们用斜体字。< / p>

Answer 2

假设您已经阅读了文件并对其进行了标记（删除了标点并将其分解为单词）并对单词进行了标准化（全部为小写，可能是删除的变音符号）。假设这些标记包含在名为tokens的列表中。

现在要确定在最后30个令牌中是否已出现相同的令牌，您可以使用滑动窗口。

window = []
for token in tokens:
    if token in window:
        print('repeated token {}'.format(token))
        #process accordingly
    window.append(token)
    window = window[-30:] #trim to last 30 entries

让我们进一步说你想把^放在重复的词语周围：

window = []
for token in tokens:
    if token in window:
        print('^{}^ '.format(token))
    else:
        print('{} '.format(token))
    window.append(token)
    window = window[-30:] #trim to last 30 entries

然后将stdout传输到某个文件中。

或制作新名单：

window = []
tokens2 = []
for token in tokens:
    if token in window:
        tokens2.append('^{}^'.format(token))
    else:
        tokens2.append(token)
    window.append(token)
    window = window[-30:] #trim to last 30 entries
print(tokens2)

在文本文件中标记单词的出现

2 个答案: