此代码每次在搜索的文件中输出匹配字符串一次(因此,如果字符串重复出现,我最终会得到一个巨大的列表)。我只想知道列表中的字符串是否匹配,而不是匹配的次数。我想知道哪些字符串匹配,因此真/假解决方案不起作用。但我只希望它们列出一次,如果匹配的话。我真的不明白pattern ='|'。join(keywords)部分正在做什么 - 我从别人的代码那里得到了我的文件到文件匹配工作,但不知道我是否需要它。非常感谢您的帮助。
# declares the files used
filenames = ['//Katie/Users/kitka/Documents/appreport.txt', '//Dallin/Users/dallin/Documents/appreport.txt' ,
'//Aidan/Users/aidan/Documents/appreport.txt']
# parses each file
for filename in filenames:
# imports the necessary libraries
import os, time, re, smtplib
from stat import * # ST_SIZE etc
# finds the time the file was last modified and error checks
try:
st = os.stat(filename)
except IOError:
print("failed to get information about", filename)
else:
# creates a list of words to search for
keywords = ['LoL', 'javaw']
pattern = '|'.join(keywords)
# searches the file for the strings in the list, sorts them and returns results
results = []
with open(filename, 'r') as f:
for line in f:
matches = re.findall(pattern, line)
if matches:
results.append((line, len(matches)))
results = sorted(results)
# appends results to the archive file
with open("GameReport.txt", "a") as f:
for line in results:
f.write(filename + '\n')
f.write(time.asctime(time.localtime(st[ST_MTIME])) + '\n')
f.write(str(line)+ '\n')
答案 0 :(得分:0)
未经测试,但这应该有用。请注意,这只会跟踪找到的单词,而不是找到哪些单词在哪些文件中。我无法弄清楚这是否是你想要的。
import fileinput
filenames = [...]
keywords = ['LoL', 'javaw']
# a set is like a list but with no duplicates, so even if a keyword
# is found multiple times, it will only appear once in the set
found = set()
# iterate over the lines of all the files
for line in fileinput.input(files=filenames):
for keyword in keywords:
if keyword in line:
found.add(keyword)
print(found)
修改强>
如果您想跟踪哪些关键字存在于哪些文件中,那么我建议保留一组(文件名,关键字)元组:
filenames = [...]
keywords = ['LoL', 'javaw']
found = set()
for filename in filenames:
with open(filename, 'rt') as f:
for line in f:
for keyword in keywords:
if keyword in line:
found.add((filename, keyword))
for filename, keyword in found:
print('Found the word "{}" in the file "{}"'.format(keyword, filename))