我有一个字符串:
mystring = "Foo: Bar (Titi) Foo-age: 50 Airplanes: 12:1 12:3 12:4 12:5 [...] Next Hop: LAX Origine ID: 49 Hop List 2 4 9 0 3 [...]"
有没有办法使用模式分割此字符串,例如:
pattern = {"Foo", "Foo-age", "Airplanes", "Next Hop", "Origine ID", "Hop List"}
然后
mylist = somefunction(mystring , pattern)
print mylist
--> {"Foo":"Bar (Titi)","Foo-age" : 50, "Airplanes": ["12:1","12:3",...], ...}
在python中可以吗?
[编辑]
一些示例数据 - 带有分隔符的5-col csv文件“,”
col-1,col-2,Path: 9876 (IGP) Local-Pref: 310000 MED: 0 Communities: 1234:6 1234:95 1234:101 1234:202 1234:500 1234:903 1234:3369 1234:8000 1234:8002 1234:16925 9876:19827 Next Hop: x.x.127.151 Originator ID: x.x.155.144 Cluster List: 0.0.29.99 0.0.29.97 0.0.26.245 0.0.2.179 ,col-4,col-5
col-1,col-2,Path: 9876 (IGP) Local-Pref: 310000 MED: 0 Communities: 1234:3 1234:95 1234:101 1234:202 1234:13705 9876:19941 Next Hop: x.x.127.61 Originator ID: x.x.137.37 Cluster List: 0.0.29.99 0.0.29.97 0.0.1.195 ,col-4,col-5
col-1,col-2,Path: 9876 (IGP) Local-Pref: 310000 MED: 0 Communities: 1234:2 1234:95 1234:101 Next Hop: x.x.127.149 Originator ID: x.x.137.29 Cluster List: 0.0.29.99 0.0.29.98 0.0.2.240 ,col-4,col-5
col-1,col-2,Path: 9876 (IGP) Local-Pref: 310000 MED: 0 Communities: 1234:6 1234:95 1234:101 1234:202 1234:500 1234:903 1234:3369 1234:8000 1234:8002 1234:16924 9876:19827 Next Hop: x.x.127.151 Originator ID: x.x.155.144 Cluster List: 0.0.29.99 0.0.29.97 0.0.26.245 0.0.2.179 ,col-4,col-5
答案 0 :(得分:2)
我认为这可以分两步完成。首先,您需要查找看起来像字段名称(Foo-Bar:
)的内容,并在每次匹配前插入“特殊”标记字符(例如@
)。其次,您查找模式marker field-name : data
并填充数据字典:
text = """
col-1,col-2,Path: 9876 (IGP) Local-Pref: 310000 MED: 0 Communities: 1234:6 1234:95 1234:101 1234:202 1234:500 1234:903 1234:3369 1234:8000 1234:8002 1234:16925 9876:19827 Next Hop: x.x.127.151 Originator ID: x.x.155.144 Cluster List: 0.0.29.99 0.0.29.97 0.0.26.245 0.0.2.179 ,col-4,col-5
"""
import re
text = re.sub(r'([A-Z][A-Za-z -]+:)', r'@\1', text)
data = {}
for m in re.finditer(r'@(.+?):([^,@]+)', text):
data[m.group(1)] = m.group(2).strip()
import pprint
pprint.pprint(data)
结果:
{'Cluster List': '0.0.29.99 0.0.29.97 0.0.26.245 0.0.2.179',
'Communities': '1234:6 1234:95 1234:101 1234:202 1234:500 1234:903 1234:3369 1234:8000 1234:8002 1234:16925 9876:19827',
'Local-Pref': '310000',
'MED': '0',
'Next Hop': 'x.x.127.151',
'Originator ID': 'x.x.155.144',
'Path': '9876 (IGP)'}
答案 1 :(得分:1)
这个可能有点棘手。
请详细说明,但目前这个解决方案应该足够或者让您更接近您的期望:
mystring = "Foo: Bar (Titi) Foo-age: 50 Airplanes: 12:1 12:3 12:4 12:5 Next Hop: LAX Origine ID: 49 Hop List: 2 4 9 0 3"
pattern = {"Foo", "Foo-age", "Airplanes", "Next Hop", "Origine ID", "Hop List"}
to_list = {'Airplanes', 'Hop List'}
def obtain_data(mystring, pattern, to_list):
result = {}
prev_pattern = None
prev_pos = 0
ordered_pattern = sorted(list(pattern), key=lambda x: mystring.find(x + ':'))
for p in ordered_pattern:
npos = mystring.find(p +':', prev_pos)
if prev_pattern is not None:
to_add = mystring[prev_pos+len(prev_pattern)+1 : npos].strip()
if prev_pattern in to_list:
to_add = to_add.split()
result[prev_pattern] = to_add
prev_pos = npos
prev_pattern = p
to_add = mystring[prev_pos+len(prev_pattern)+1 : len(mystring)].strip()
if prev_pattern in to_list:
to_add = to_add.split()
result[prev_pattern] = to_add
return result
obtain_data(mystring, pattern, to_list)
这将返回:
{'Foo-age': '50', 'Hop List': ['2', '4', '9', '0', '3'], 'Airplanes': ['12:1', '12:3', '12:4', '12:5'], 'Next Hop': 'LAX', 'Foo': 'Bar (Titi)', 'Origine ID': '49'}
我首先将模式排序到一个列表中,根据它们出现在字符串中的位置。
我正在考虑在模式之后总是':'。如果情况并非如此,那么可能证明这样做太难了(考虑到可能存在一种模式是另一种模式的前缀,比如你展示的Foo和Foo年龄)。
另一件事是:如果您希望其中一个值成为列表,则必须指定将其值转换为列表的模式作为第三个参数。如果你认为你不会总是这样做,你可以提供一个空集或者只是跳过这个,但我只是按你的例子所示。
最终编辑:如果您的代码中可能没有出现模式,您只需使用
过滤排序列表的结果即可ordered_pattern = filter(lambda x: mystring.find(x) != -1, ordered_pattern)
在for循环之前
希望这对你有好处:))