如何查看文件中有多少关键字也在另一个文件中?我有一个包含单词列表的文件,我正在试图弄清楚这些单词是否在另一个文件中。
我有一个包含关键词的文件(keywords.txt),我试图找出另一个文件是否包含(tweets.txt),其中包含句子,包含任何关键词
def main() :
done = False
while not done:
try:
keywords = input("Enter the filename titled keywords: ")
with open(keywords, "r") as words:
done = True
except IOError:
print("Error: file not found.")
total = 0
try:
tweets = input("Enter the file Name titled tweets: ")
with open(tweets, 'r') as tweets:
except IOError:
print("Error: file not found.")
def sentiment_of_msg(msg_words_counter):
summary = 0
for line in tweets:
if happy_dict in line:
summary += 10 * **The number of keywords in the sentence of the file**
elif veryUnhappy_dict in line:
summary += 1 * quantity
elif neutral_dict in line:
summary += 5 * quantity
return summary
答案 0 :(得分:1)
我觉得这是作业,所以我能做的最好就是给你一个解决方案的大纲。
如果你能负担得起在内存中加载文件:
if token in set
并在固定时间内获得答案。伪代码看起来像这样:
file=open(keywords)
keywords_set=set()
for token in file.readlines():
for word in token.split():
keywords_set.add(word)
file=open(tweets)
for token in file.readlines():
preprocess(token) #function with your custom logic
for item in token.split():
if item in keywords:
do_stuff() #function with your custom logic
如果您想要关键字的频率,请使用{key:key_frequency}构建字典。或者查看Counter并考虑如何解决您的问题。
如果您无法将推文文件加载到内存中,请考虑lazy solution使用生成器读取大文件