Question

我编写了以下代码，用于在infile中查找与关键字文件中的任何关键字匹配的行。问题是，我想只得到那些包含所有关键字的infile行。似乎比我想象的更难，但我是一个初学者，所以我想我只是错过了一些明显的东西。正则表达式似乎没有直截了当的＆＃39;和＆＃39;然而，运营商。

import re
infile = open('path/#input.txt', 'r')
outfile = open('path/#output.txt', 'w')

# Read a textfile containing keywords to find
# (and strip the newline character '\n')
keywords = [line.strip() for line in open('path/#keywords.txt')]

# Compile keywords into a regex pattern 
pattern = re.compile('|'.join(keywords))

# See which lines in the infile match any of the keywords
# and write those lines to the outfile
for line in infile:
    if pattern.search(line):
        outfile.write(line)

Answer 1

正则表达不应该像那样使用。相反，您应该使用all()：

infile = open('path/#input.txt', 'r')
outfile = open('path/#output.txt', 'w')

keywords = [line.strip() for line in open('path/#keywords.txt')]

for line in infile:
    if all(k in line for k in keywords):
        outfile.write(line)

Answer 2

正则表达式不是能够解决每个问题的瑞士军刀。它们不是解决这个问题的好方法：

无法使用一次正则表达式操作来执行您正在寻找的那种联合操作。
Regexp不应该用于进行纯文本搜索，因为＆＃34;纯文本＆＃34;关键字可以包含在regexp中触发不同行为的字符（例如.或$）

试试这个，在另一个内部使用一个for循环来遍历每一行的所有关键字：

keywords = ...

for line in infile:
    # iterate through ALL the keywords
    found_all = True
    for kw in keywords:
        # if ANY keyword is not found, found_all = False 
        if kw not in line:
            found_all = False

    if found_all:
        outfile.write(line)

更新： @Stefano Sanfilippo的解决方案是同一件事的更简洁版本。：）

在Python中使用AND运算符匹配行

2 个答案: