Question

我遇到了一个问题，我正在尝试编写一个程序来梳理“某些”搜索条件的配置文件，如果它们匹配，则打印“它就在那里”，如果不打印“它不在这里”。以下是我到目前为止的情况：

import sys
import fnmatch
import re

check = ["test1", "test2", "test3"]

 for f in filter(os.path.isfile, sys.argv[1:]): ##open doc arg
    for line in open(f).readlines(): ##loop for reading line by line
        if re.match(check[0], line): ##match at beginning for check
            print(check[0], "is in place") ##print if match == true
        elif re.search(check[0], line): ##if not check search (full file)
            print(check[0], "is not in place") ##print if true
    for line in open(f).readlines():
        if re.match(check[1], line):
            print(check[1], "is in place")
        elif ((re.search(check[1], line)) == None):
            print(check[1], "is not in place")

所以问题是，如果我打印一个else语句，那么每个行（全部1500个）都会打印，因为循环逐行运行。有没有办法搜索整个文档而不是逐行搜索？

Answer 1

是的，这可以使用read()。但请注意，如果您的文件很大，那么在您的内存中一次加载整个文件可能不是一个好主意。

此外，您多次循环遍历同一文件，尝试通过仅迭代文件一次并立即搜索check数组中的所有值来避免这种情况。此外，尽量避免使用正则表达式，因为它们可能很慢。这样的事情也可以起作用：

for line in open(f).readlines():
    for check_value in check:
        if check_value in line:
            print "{} is in place.".format(check_value)

Answer 2

使用else循环的for子句和break语句。还要注意，只需迭代文件本身即可;无需明确阅读所有行。（我还添加了with以确保文件已关闭。）

with open(f) as infile:
    for line in infile:
        if re.match(check[0], line):
            print(check[0], "is in place")
            break     # stop after finding one match
    else:             # we got to the end of the file without a match
        print(check[0], "is not in place")

你甚至可以把它写成那些流行的生成器表达式之一：

with open(f) as infile:
    if any(re.match(check[0], line) for line in infile):
        print(check[0], "is in place")
    else:
        print(check[0], "is not in place")

由于正在打印的邮件非常相似，您可以进一步对其进行编码 -

with open(f) as infile:
    print(check[0], "is" if any(re.match(check[0], line) for line in infile) else "is not", "in place")

Answer 3

要阅读整个文件，您可以使用read()代替readlines()。

with open(f) as fil:
    lines = fil.read()

如果您在文件中查找的内容只是一个字符串，则表示您不需要re：

if check[0] in lines:
    print(check[0], "is in place")

Answer 4

我猜你可以把文件读成字符串并使用简单的if x in...，即：

with open("text_contains.txt") as f:
    text =  f.read().lower() # remove .lower() for caseSensiTive matching
for x in ["test1", "test2", "test3"]:
    if x in text:
        print("{} is in place".format(x))
    else:
        print("{} is not in place".format(x))

Answer 5

如果你真的需要逐行读取文件（我假设你需要出现的那一行），那么：

import sys
import fnmatch
import re

searchTerms = ["test1", "test2", "test3"]
occurrences = {}

# Initialise occurrences list for each term:

for term in searchTerms:
    occurrences[term] = []

# Read line by line and check if any of the terms is present in that specific
# line. If it is, save the occurrence.

for f in filter(os.path.isfile, sys.argv[1:]):
    for line in open(f).readlines():
        for term in searchTerms:
            if re.match(term, line):
                occurrences[term].append(line)

# For each term, print all the lines with occurrences, if any, or 'not found'
# otherwise:

for term in searchTerms:
    if len(occurrences[term]) > 0:
        print("'%s' found in lines: %s" % ", ".join(occurrences[term]))
    else:
        print("'%s' not found" % term)

但是，如果您只需要检查该术语是否存在，无论该行是什么，只需使用read一次读取整个文件：

for f in filter(os.path.isfile, sys.argv[1:]):
    with open(f) as file:
        text = file.read()

        for term in searchTerms:
            if re.match(term, text):
                print("'%s' found" % term)
            else:
                print("'%s' not found" % term)

搜索txt文件中的字符串/ else打印不存在

5 个答案: