Question

我有一个大文本文件（60Mb），如下所示：

:VPN ()
:add_adtr_rule (true)
:additional_products ()
:addr_type_indication (IPv4)
:certificates ()
:color (black)
:comments ()
:connectra (false)
:connectra_settings ()
:cp_products_installed (false)
:data_source (not-installed)
:data_source_settings ()
:edges ()
:enforce_gtp_rate_limit (false)
:firewall (not-installed)
:floodgate (not-installed)
:gtp_rate_limit (2048)
:interfaces ()
:ipaddr (10.19.45.18)

对于每个实例，其中：add_adtr_rule为true，有数千个'：add_adtr_rule（false）'条目，我需要ipaddr的值 - 所以在这个实例中我需要10.19.45.18。如何使用正则表达式提取此信息。

我尝试了以下代码，它返回一个空列表：

import re

with open("objects_5_0_C-Mod.txt", "r") as f:
    text = f.read()

ip=re.findall(r':add_adtr_rule [\(]true[\)]\s+.*\s+.*\s+.*\s+.*\s+:ipaddr\s+[\(](.*)[\)]', text)
print(ip)

Answer 1

以下 regex 应该这样做：

(?s)(?:add_adtr_rule\s\(true\)).*?:ipaddr\s\((.*?)\)

参见 regex demo / explanation

python （demo）

import re

s = """:VPN () :add_adtr_rule (true) :additional_products () :addr_type_indication (IPv4) :certificates () :color (black) :comments () :connectra (false) :connectra_settings () :cp_products_installed (false) :data_source (not-installed) :data_source_settings () :edges () :enforce_gtp_rate_limit (false) :firewall (not-installed) :floodgate (not-installed) :gtp_rate_limit (2048) :interfaces () :ipaddr (10.19.45.18)"""
r = r"(?s)(?:add_adtr_rule\s\(true\)).*?:ipaddr\s\((.*?)\)"
ip = re.findall(r, s)
print (ip)

Answer 2

您可能希望添加锚点以加快速度。请在启用MULTILINE和VERBOSE的情况下考虑以下示例：

^:add_adtr_rule\ \(true\)   # start of line, followed by :add_ ...
[\s\S]+?                    # everything else afterwards, lazily          
^:ipaddr\ \((?P<ip>[^)]+)\) # start of line, ip and group "ip" between ()

见a demo on regex101.com。

<小时/> 使用您给定的代码，这可以归结为：

import re

rx = re.compile(r'''
        ^:add_adtr_rule\ \(true\)
        [\s\S]+?
        ^:ipaddr\ \((?P<ip>[^)]+)\) 
        ''', re.MULTILINE | re.VERBOSE)

with open("objects_5_0_C-Mod.txt", "r") as f:
    text = f.read()

ips = [match.group('ip') for match in rx.finditer(text)]
print(ips)

使用正则表达式

2 个答案: