我编写了以下代码,用于在infile中查找与关键字文件中的任何关键字匹配的行。问题是,我想只得到那些包含所有关键字的infile行。似乎比我想象的更难,但我是一个初学者,所以我想我只是错过了一些明显的东西。正则表达式似乎没有直截了当的'和'然而,运营商。
import re
infile = open('path/#input.txt', 'r')
outfile = open('path/#output.txt', 'w')
# Read a textfile containing keywords to find
# (and strip the newline character '\n')
keywords = [line.strip() for line in open('path/#keywords.txt')]
# Compile keywords into a regex pattern
pattern = re.compile('|'.join(keywords))
# See which lines in the infile match any of the keywords
# and write those lines to the outfile
for line in infile:
if pattern.search(line):
outfile.write(line)
答案 0 :(得分:6)
正则表达不应该像那样使用。相反,您应该使用all()
:
infile = open('path/#input.txt', 'r')
outfile = open('path/#output.txt', 'w')
keywords = [line.strip() for line in open('path/#keywords.txt')]
for line in infile:
if all(k in line for k in keywords):
outfile.write(line)
答案 1 :(得分:3)
正则表达式不是能够解决每个问题的瑞士军刀。它们不是解决这个问题的好方法:
.
或$
)试试这个,在另一个内部使用一个for
循环来遍历每一行的所有关键字:
keywords = ...
for line in infile:
# iterate through ALL the keywords
found_all = True
for kw in keywords:
# if ANY keyword is not found, found_all = False
if kw not in line:
found_all = False
if found_all:
outfile.write(line)
更新: @Stefano Sanfilippo的解决方案是同一件事的更简洁版本。 :)