我遇到了一个问题,我正在尝试编写一个程序来梳理“某些”搜索条件的配置文件,如果它们匹配,则打印“它就在那里”,如果不打印“它不在这里”。以下是我到目前为止的情况:
import sys
import fnmatch
import re
check = ["test1", "test2", "test3"]
for f in filter(os.path.isfile, sys.argv[1:]): ##open doc arg
for line in open(f).readlines(): ##loop for reading line by line
if re.match(check[0], line): ##match at beginning for check
print(check[0], "is in place") ##print if match == true
elif re.search(check[0], line): ##if not check search (full file)
print(check[0], "is not in place") ##print if true
for line in open(f).readlines():
if re.match(check[1], line):
print(check[1], "is in place")
elif ((re.search(check[1], line)) == None):
print(check[1], "is not in place")
所以问题是,如果我打印一个else语句,那么每个行(全部1500个)都会打印,因为循环逐行运行。有没有办法搜索整个文档而不是逐行搜索?
答案 0 :(得分:1)
是的,这可以使用read()
。但请注意,如果您的文件很大,那么在您的内存中一次加载整个文件可能不是一个好主意。
此外,您多次循环遍历同一文件,尝试通过仅迭代文件一次并立即搜索check
数组中的所有值来避免这种情况。此外,尽量避免使用正则表达式,因为它们可能很慢。这样的事情也可以起作用:
for line in open(f).readlines():
for check_value in check:
if check_value in line:
print "{} is in place.".format(check_value)
答案 1 :(得分:1)
使用else
循环的for
子句和break
语句。还要注意,只需迭代文件本身即可;无需明确阅读所有行。 (我还添加了with
以确保文件已关闭。)
with open(f) as infile:
for line in infile:
if re.match(check[0], line):
print(check[0], "is in place")
break # stop after finding one match
else: # we got to the end of the file without a match
print(check[0], "is not in place")
你甚至可以把它写成那些流行的生成器表达式之一:
with open(f) as infile:
if any(re.match(check[0], line) for line in infile):
print(check[0], "is in place")
else:
print(check[0], "is not in place")
由于正在打印的邮件非常相似,您可以进一步对其进行编码 -
with open(f) as infile:
print(check[0], "is" if any(re.match(check[0], line) for line in infile) else "is not", "in place")
答案 2 :(得分:0)
要阅读整个文件,您可以使用read()
代替readlines()
。
with open(f) as fil:
lines = fil.read()
如果您在文件中查找的内容只是一个字符串,则表示您不需要re
:
if check[0] in lines:
print(check[0], "is in place")
答案 3 :(得分:0)
我猜你可以把文件读成字符串并使用简单的if x in...
,即:
with open("text_contains.txt") as f:
text = f.read().lower() # remove .lower() for caseSensiTive matching
for x in ["test1", "test2", "test3"]:
if x in text:
print("{} is in place".format(x))
else:
print("{} is not in place".format(x))
答案 4 :(得分:0)
如果你真的需要逐行读取文件(我假设你需要出现的那一行),那么:
import sys
import fnmatch
import re
searchTerms = ["test1", "test2", "test3"]
occurrences = {}
# Initialise occurrences list for each term:
for term in searchTerms:
occurrences[term] = []
# Read line by line and check if any of the terms is present in that specific
# line. If it is, save the occurrence.
for f in filter(os.path.isfile, sys.argv[1:]):
for line in open(f).readlines():
for term in searchTerms:
if re.match(term, line):
occurrences[term].append(line)
# For each term, print all the lines with occurrences, if any, or 'not found'
# otherwise:
for term in searchTerms:
if len(occurrences[term]) > 0:
print("'%s' found in lines: %s" % ", ".join(occurrences[term]))
else:
print("'%s' not found" % term)
但是,如果您只需要检查该术语是否存在,无论该行是什么,只需使用read
一次读取整个文件:
for f in filter(os.path.isfile, sys.argv[1:]):
with open(f) as file:
text = file.read()
for term in searchTerms:
if re.match(term, text):
print("'%s' found" % term)
else:
print("'%s' not found" % term)