Question

如何查看文件中有多少关键字也在另一个文件中？我有一个包含单词列表的文件，我正在试图弄清楚这些单词是否在另一个文件中。

我有一个包含关键词的文件（keywords.txt），我试图找出另一个文件是否包含（tweets.txt），其中包含句子，包含任何关键词

def main() :
   done = False
   while not done:
        try:
            keywords = input("Enter the filename titled keywords: ")
            with open(keywords, "r") as words:
                done = True
        except IOError:
            print("Error: file not found.")

total = 0
try:
    tweets = input("Enter the file Name titled tweets: ")
    with open(tweets, 'r') as tweets:
except IOError:
    print("Error: file not found.")

def sentiment_of_msg(msg_words_counter):
        summary = 0
        for line in tweets:
                if happy_dict in line:
                    summary += 10 * **The number of keywords in the sentence of the file**
                elif veryUnhappy_dict in line:
                    summary += 1 * quantity 
                elif neutral_dict in line:
                    summary += 5 * quantity
            return summary

Answer 1

我觉得这是作业，所以我能做的最好就是给你一个解决方案的大纲。

如果你能负担得起在内存中加载文件：

加载keywords.txt，read its lines，将它们拆分为令牌并从中构建set。现在，您拥有一个能够快速进行成员资格查询的数据结构（即您可以询问if token in set并在固定时间内获得答案。
像对待关键字一样加载推文文件，并逐行读取其内容（或者格式化它们）。您可能需要进行一些预处理（剥离空格，替换不必要的字符，删除无效的单词，逗号等）。对于每一行，将其拆分，以便获得每条推文的单词，并询问是否有任何分割的单词都在关键字集中。

伪代码看起来像这样：

file=open(keywords)
keywords_set=set()
for token in file.readlines():
    for word in token.split():
        keywords_set.add(word)

file=open(tweets)
for token in file.readlines():
   preprocess(token) #function with your custom logic
   for item in token.split():
       if item in keywords:
           do_stuff() #function with your custom logic

如果您想要关键字的频率，请使用{key：key_frequency}构建字典。或者查看Counter并考虑如何解决您的问题。

如果您无法将推文文件加载到内存中，请考虑lazy solution使用生成器读取大文件

如何读取文件并计算特定值

1 个答案: