Question

我需要一个程序来计算文件中行的前5个最常见的第一个单词，哪个不包含第一个单词后跟“DM”或“RT”的行？

到目前为止我没有任何代码，因为我完全迷失了。

f = open("C:/Users/Joe Simpleton/Desktop/talking.txt", "r")
?????

Answer 1

读取文本的每一行。对于每一行，使用正则表达式将其拆分为单词，这将返回单词列表。如果至少有两个单词，请测试第二个单词以确保它不在您的列表中。然后使用Counter()跟踪所有字数。存储每个单词的小写，以便不分别计算同一单词的大写和小写版本：

from collections import Counter
import re

word_counts = Counter()

with open('talking.txt') as f_input:
    for line in f_input:
        words = re.findall(r'\w+', line)

        if (len(words) > 1 and words[1] not in ['DM', 'RT']) or len(words) == 1:
            word_counts.update(word.lower() for word in words)

print(word_counts.most_common(5))

Counter()有一个很有用的功能，可以显示最常见的值。

Answer 2

未经测试，但应该大致相同：

from collections import Counter

count = Counter()
with open("path") as f:
     for line in f:
          parts = line.split(" ")
          if parts[1] not in ["DM", "RT"]:
                count[parts[0]] += 1
     print(count.most_common(5))

您还应添加一项检查，确保零件具有＆gt; 2个元素。

txt文件中第一个单词的总数

2 个答案: