Question

我有一个字符串：

mystring = "Foo: Bar (Titi) Foo-age: 50 Airplanes: 12:1 12:3 12:4 12:5 [...] Next Hop: LAX Origine ID: 49 Hop List 2 4 9 0 3 [...]"

有没有办法使用模式分割此字符串，例如：

pattern = {"Foo", "Foo-age", "Airplanes", "Next Hop", "Origine ID", "Hop List"}

然后

mylist = somefunction(mystring , pattern)
print mylist 
--> {"Foo":"Bar (Titi)","Foo-age" : 50, "Airplanes": ["12:1","12:3",...], ...}

在python中可以吗？

[编辑]

一些示例数据 - 带有分隔符的5-col csv文件“，”

col-1,col-2,Path: 9876 (IGP) Local-Pref: 310000   MED: 0 Communities: 1234:6 1234:95 1234:101 1234:202 1234:500 1234:903 1234:3369 1234:8000 1234:8002 1234:16925 9876:19827 Next Hop: x.x.127.151   Originator ID: x.x.155.144 Cluster List: 0.0.29.99 0.0.29.97 0.0.26.245 0.0.2.179 ,col-4,col-5

col-1,col-2,Path: 9876 (IGP) Local-Pref: 310000   MED: 0 Communities: 1234:3 1234:95 1234:101 1234:202 1234:13705 9876:19941 Next Hop: x.x.127.61   Originator ID: x.x.137.37 Cluster List: 0.0.29.99 0.0.29.97 0.0.1.195 ,col-4,col-5

col-1,col-2,Path: 9876 (IGP) Local-Pref: 310000   MED: 0 Communities: 1234:2 1234:95 1234:101 Next Hop: x.x.127.149   Originator ID: x.x.137.29 Cluster List: 0.0.29.99 0.0.29.98 0.0.2.240 ,col-4,col-5

col-1,col-2,Path: 9876 (IGP) Local-Pref: 310000   MED: 0 Communities: 1234:6 1234:95 1234:101 1234:202 1234:500 1234:903 1234:3369 1234:8000 1234:8002 1234:16924 9876:19827 Next Hop: x.x.127.151   Originator ID: x.x.155.144 Cluster List: 0.0.29.99 0.0.29.97 0.0.26.245 0.0.2.179 ,col-4,col-5

Answer 1

我认为这可以分两步完成。首先，您需要查找看起来像字段名称（Foo-Bar:）的内容，并在每次匹配前插入“特殊”标记字符（例如@）。其次，您查找模式marker field-name : data并填充数据字典：

text = """
col-1,col-2,Path: 9876 (IGP) Local-Pref: 310000   MED: 0 Communities: 1234:6 1234:95 1234:101 1234:202 1234:500 1234:903 1234:3369 1234:8000 1234:8002 1234:16925 9876:19827 Next Hop: x.x.127.151   Originator ID: x.x.155.144 Cluster List: 0.0.29.99 0.0.29.97 0.0.26.245 0.0.2.179 ,col-4,col-5
"""

import re

text = re.sub(r'([A-Z][A-Za-z -]+:)', r'@\1', text)
data = {}
for m in re.finditer(r'@(.+?):([^,@]+)', text):
    data[m.group(1)] = m.group(2).strip()

import pprint
pprint.pprint(data)

结果：

 {'Cluster List': '0.0.29.99 0.0.29.97 0.0.26.245 0.0.2.179',
  'Communities': '1234:6 1234:95 1234:101 1234:202 1234:500 1234:903 1234:3369 1234:8000 1234:8002 1234:16925 9876:19827',
  'Local-Pref': '310000',
  'MED': '0',
  'Next Hop': 'x.x.127.151',
  'Originator ID': 'x.x.155.144',
  'Path': '9876 (IGP)'}

Answer 2

这个可能有点棘手。

请详细说明，但目前这个解决方案应该足够或者让您更接近您的期望：

mystring = "Foo: Bar (Titi) Foo-age: 50 Airplanes: 12:1 12:3 12:4 12:5 Next Hop: LAX Origine ID: 49 Hop List: 2 4 9 0 3"
pattern = {"Foo", "Foo-age", "Airplanes", "Next Hop", "Origine ID", "Hop List"}
to_list = {'Airplanes', 'Hop List'}
def obtain_data(mystring, pattern, to_list):    
    result = {}
    prev_pattern = None
    prev_pos = 0
    ordered_pattern = sorted(list(pattern), key=lambda x: mystring.find(x + ':'))
    for p in ordered_pattern:
        npos = mystring.find(p +':', prev_pos)
        if prev_pattern is not None:
            to_add = mystring[prev_pos+len(prev_pattern)+1 : npos].strip()
            if prev_pattern in to_list:
                to_add = to_add.split()
            result[prev_pattern] = to_add
        prev_pos = npos
        prev_pattern = p
    to_add = mystring[prev_pos+len(prev_pattern)+1 : len(mystring)].strip()
    if prev_pattern in to_list:
        to_add = to_add.split()
    result[prev_pattern] = to_add
    return result

obtain_data(mystring, pattern, to_list)

这将返回：

{'Foo-age': '50', 'Hop List': ['2', '4', '9', '0', '3'], 'Airplanes': ['12:1', '12:3', '12:4', '12:5'], 'Next Hop': 'LAX', 'Foo': 'Bar (Titi)', 'Origine ID': '49'}

我首先将模式排序到一个列表中，根据它们出现在字符串中的位置。

我正在考虑在模式之后总是'：'。如果情况并非如此，那么可能证明这样做太难了（考虑到可能存在一种模式是另一种模式的前缀，比如你展示的Foo和Foo年龄）。

另一件事是：如果您希望其中一个值成为列表，则必须指定将其值转换为列表的模式作为第三个参数。如果你认为你不会总是这样做，你可以提供一个空集或者只是跳过这个，但我只是按你的例子所示。

最终编辑：如果您的代码中可能没有出现模式，您只需使用

过滤排序列表的结果即可

ordered_pattern = filter(lambda x: mystring.find(x) != -1, ordered_pattern)

在for循环之前

希望这对你有好处:)）

有没有办法使用'模式'来爆炸字符串

2 个答案: