我有一个看起来像这样的文本文件
127.0.0.1
159.187.32.13, 3:00:15, flags: S
Incoming interface: Ethernet51/1
RPF route: [U] 151.177.45.0/27 [20/0] via 190.150.1.2
Outgoing interface list:
Vlan4054
159.187.32.20, 2:20:11, flags: S
Incoming interface: Ethernet51/1
RPF route: [U] 151.177.45.59/27 [20/0] via 190.150.1.2
Outgoing interface list:
Vlan4054
Vlan4056
198.140.45.77, 2:36:15, flags: S
Incoming interface: Ethernet51/1
RPF route: [U] 151.177.45.88/27 [20/0] via 190.150.1.2
Outgoing interface list:
Vlan4054
127.0.0.2
188.125.45.13, 3:00:15, flags: S
Incoming interface: Ethernet51/1
RPF route: [U] 199.150.45.0/27 [20/0] via 195.32.1.2
Outgoing interface list:
Vlan4054
Vlan4056
221.125.45.77, 2:20:11, flags: S
Incoming interface: Ethernet51/1
RPF route: [U] 199.150.45.10/27 [20/0] via 195.32.1.2
Outgoing interface list:
Vlan4054
Vlan4056
我正在尝试创建数据字典以使其可解析,目前正在尝试通过正则表达式
import re
content = []
content_dict = {}
group_ip = re.compile("^(\d+\.\d+\.\d+\.\d+$)")
ip_subnet = re.compile("^(\d+\.\d+\.\d+\.\d+\/+\d+)")
two_space_start = re.compile("^( {2})\S")
four_space_start = re.compile("^( {4})\S")
six_space_start = re.compile("^( {6})\S")
我计划将正则表达式应用于每行并创建一个如下所示的字典
if group_ip.match(line):
content_dict["group"] = line.strip()
elif two_space.match(line) and "RP" in line:
line = line.split(",")
content_dict["source"] = line[0].strip()
content_dict["uptime"] = line[1].strip()
content_dict["rp"] = line[2].split(" ")[-1]
content_dict["source_flags"] = line[-1].split(":")[-1].strip()
content.append(copy.copy(content_dict))
但是已经意识到,这将无法大规模使用,因为每个IP组(127.0.0.1,127.0.0.2)将具有可变数量的子组,我正在覆盖这些子组。我想要达到的目标是
"127.0.0.1": [
"159.187.32.13": [
"uptime": "3:00:15",
"flags": "S",
"rpf_ip": "151.177.45.0/27",
"via": "190.150.1.2",
"outgoing_interface": ["vlan4054"]
],
"159.187.32.20": [
"uptime": "2:20:11",
"flags": "S",
"rpf_ip": "151.177.45.59/27",
"via": "190.150.1.2",
"outgoing_interface": ["Vlan4054", "Vlan4056"]
]
]
是否可以通过正则表达式或其他方式从文本中获取此数据结构?
答案 0 :(得分:1)
由于输入相当容易标记化,因此正则表达式可能会过大。您可以根据自己的目的使用str.startswith
,str.isdigit
和str.split
:
from pprint import pprint
content = {}
with open('file.txt', 'r') as f:
for line in f:
line = line.rstrip()
if line[0].isdigit():
group = line
content[group] = {}
elif line.startswith(' ') and line[2].isdigit():
ip, uptime, flags = line.lstrip().split(', ')
_, flags = flags.split()
content[group][ip] = {'uptime': uptime, 'flags': flags, 'outgoing_interface': []}
elif line.startswith(' RPF route:'):
_, _, _, rpf_ip, _, _, via = line.split()
content[group][ip]['rpf_ip'] = rpf_ip
content[group][ip]['via'] = via
elif line.startswith(' '):
content[group][ip]['outgoing_interface'].append(line.lstrip())
pprint(content)
此输出(带有您的示例输入):
{'127.0.0.1': {'159.187.32.13': {'flags': 'S',
'outgoing_interface': ['Vlan4054'],
'rpf_ip': '151.177.45.0/27',
'uptime': '3:00:15',
'via': '190.150.1.2'},
'159.187.32.20': {'flags': 'S',
'outgoing_interface': ['Vlan4054', 'Vlan4056'],
'rpf_ip': '151.177.45.59/27',
'uptime': '2:20:11',
'via': '190.150.1.2'},
'198.140.45.77': {'flags': 'S',
'outgoing_interface': ['Vlan4054'],
'rpf_ip': '151.177.45.88/27',
'uptime': '2:36:15',
'via': '190.150.1.2'}},
'127.0.0.2': {'188.125.45.13': {'flags': 'S',
'outgoing_interface': ['Vlan4054', 'Vlan4056'],
'rpf_ip': '199.150.45.0/27',
'uptime': '3:00:15',
'via': '195.32.1.2'},
'221.125.45.77': {'flags': 'S',
'outgoing_interface': ['Vlan4054', 'Vlan4056'],
'rpf_ip': '199.150.45.10/27',
'uptime': '2:20:11',
'via': '195.32.1.2'}}}