Python - re.split字符串

时间:2018-03-25 14:46:11

标签: python regex list split

我一直在寻找几个小时的解决方案。我有一个变量我想在嵌套列表中拆分。

points ="""M445,346c28.8,0,56,11.2,76.4,31.6C541.8,398,553,425.2,553,454s-11.2,56-31.6,76.4C501,550.8,473.8,562,445,562
        s-56-11.2-76.4-31.6C348.2,510,337,482.8,337,454s11.2-56,31.6-76.4S416.2,346,445,346 M445,345c-60.2,0-109,48.8-109,109
        s48.8,109,109,109s109-48.8,109-109S505.2,345,445,345L445,345z"""

newPoints = re.split(r'[A-Za-z-]', points)

这是一个多行var,其中包含来自svg文件的点的x和y位置。

模式是它在一个字母处开始一个新项目。我想订购类似下面的东西。我尝试过上面的一些选项。其中一个邮件问题是它一直在删除我的分隔符。 :)

[ 
  [ 
     [command],
     [x of p1, y of p1],
     [x of p2, y of p2], 
     [x of p3, y of p3] 
  ] 
]

[ 
[ [M],[445,346] ],
[ [c],[28.8,0],[56,11.2],[76.4,31.6] ]
]

非常欢迎任何指示!

1 个答案:

答案 0 :(得分:2)

您可以找到字母和浮点数,然后分组:

import re
import itertools
points ="""M445,346c28.8,0,56,11.2,76.4,31.6C541.8,398,553,425.2,553,454s-11.2,56-31.6,76.4C501,550.8,473.8,562,445,562
    s-56-11.2-76.4-31.6C348.2,510,337,482.8,337,454s11.2-56,31.6-76.4S416.2,346,445,346 M445,345c-60.2,0-109,48.8-109,109
    s48.8,109,109,109s109-48.8,109-109S505.2,345,445,345L445,345z"""
new_points = [list(b) for a, b in itertools.groupby(filter(None, re.findall('[a-zA-Z]+|[\d\.]+', points)), key=lambda x:re.findall('[a-zA-Z]+', x))]
final_data = [[new_points[i], [int(c) if re.findall('^\d+$', c) else float(c) for c in new_points[i+1]]] for i in range(0, len(new_points)-1, 2)]

输出:

[[['M'], [445, 346]], [['c'], [28.8, 0, 56, 11.2, 76.4, 31.6]], [['C'], [541.8, 398, 553, 425.2, 553, 454]], [['s'], [11.2, 56, 31.6, 76.4]], [['C'], [501, 550.8, 473.8, 562, 445, 562]], [['s'], [56, 11.2, 76.4, 31.6]], [['C'], [348.2, 510, 337, 482.8, 337, 454]], [['s'], [11.2, 56, 31.6, 76.4]], [['S'], [416.2, 346, 445, 346]], [['M'], [445, 345]], [['c'], [60.2, 0, 109, 48.8, 109, 109]], [['s'], [48.8, 109, 109, 109]], [['s'], [109, 48.8, 109, 109]], [['S'], [505.2, 345, 445, 345]], [['L'], [445, 345]]]