Question

我正在尝试解析某些事件的日志文件。由于日志文件可以很安静，因此有必要过滤掉我们需要的应用程序不感兴趣的行。我的想法是，我创建了一个列表，其中包含我要查找的4或5个字符串，然后在另一个包含我保留日志文件的行的列表中循环。

日志文件是代理的日志，用于查看请求的来源通过在行中查找“GET /”并且仅存储具有该行中的“GET /”，第一次减少很容易。

with open('logfile', 'r') as f:
    for line in f:
        if "GET /" in line:
           lines.append(line)

列表'lines'需要缩减为包含url中多个字符串之一的行

l1 = ['/Treintickets/aankopen', '/booking/Tickets', '/Acheter/Billets', ...]

我尝试了列表理解，但这不起作用：

result = [l for l in lines if l1 in l]

有没有办法让这个工作无需绕过'l1'的每个成员的大列表行？

Answer 1

您可以使用内置函数any：

result = [line for line in lines if any(substring in line for substring in l1)]

或者，您可以考虑使用正则表达式。

Answer 2

Wim的答案非常好，并确定了解决理解的正确方法。

但是，如果输入文本文件非常大，我会建议使用生成器表达式而不是理解！这将阻止Python将整个文件加载到内存中。

with open(<file>, "r") as fin:
    generator = (line for line in fin if any(substr in line for substr in l1))
    for res in generator:
         # Handle result found

检查列表中的元素是否存在于python中列表的元素中

2 个答案: