我正在尝试解析Ubuntu中的/ etc / network / interfaces配置文件,因此我需要将字符串分成字符串列表,其中每个字符串以给定关键字之一开头。
根据手册:
该文件由零个或多个“iface”,“mapping”,“auto”,“allow-”和“source”节组成。
所以如果文件包含:
auto lo eth0
allow-hotplug eth1
iface eth0-home inet static
address 192.168.1.1
netmask 255.255.255.0
我想获得清单:
['auto lo eth0','allow-hotplug eth1','iface eth0-home inet static \ n address ...']
现在我的功能如下:
def get_sections(text):
start_indexes = [s.start() for s in re.finditer('auto|iface|source|mapping|allow-', text)]
start_indexes.reverse()
end_idx = -1
res = []
for i in start_indexes:
res.append(text[i: end_idx].strip())
end_idx = i
res.reverse()
return res
但这不好......
答案 0 :(得分:3)
您可以在单个正则表达式中执行此操作:
>>> reobj = re.compile("(?:auto|allow-|iface)(?:(?!(?:auto|allow-|iface)).)*(?<!\s)", re.DOTALL)
>>> result = reobj.findall(subject)
>>> result
['auto lo eth0', 'allow-hotplug eth1', 'iface eth0-home inet static\n address 192.168.1.1\n netmask 255.255.255.0']
<强>解释强>
(?:auto|allow-|iface) # Match one of the search terms
(?: # Try to match...
(?! # (as long as we're not at the start of
(?:auto|allow-|iface) # the next search term):
) #
. # any character.
)* # Do this any number of times.
(?<!\s) # Assert that the match doesn't end in whitespace
当然,您也可以根据评论中的要求将结果映射到元组列表中:
>>> reobj = re.compile("(auto|allow-|iface)\s*((?:(?!(?:auto|allow-|iface)).)*)(?<!\s)", re.DOTALL)
>>> result = [tuple(match.groups()) for match in reobj.finditer(subject)]
>>> result
[('auto', 'lo eth0'), ('allow-', 'hotplug eth1'), ('iface', 'eth0-home inet static\n address 192.168.1.1\n netmask 255.255.255.0')]
答案 1 :(得分:2)
当你计算起始指数时,你非常接近一个干净的解决方案。有了这些,您可以添加一行来提取所需的切片:
indicies = [s.start() for s in re.finditer(
'auto|iface|source|mapping|allow-', text)]
answer = map(text.__getslice__, indicies, indicies[1:] + [len(text)])