Question

请考虑以下事项："MULTILINESTRING((10 10,10 40),(40 40,30 30,40 20,30 10))" 我想将其转换为：[[10,10],[10,40],[40,40],[30,30],[40,20],[30,10]]。

我的解决方案
我使用函数split()和replace()来格式化它。我得到了一些脏代码，可能不是最有效的my_str.split('((')[1].split('))')[1]...etc

因为我是在一个庞大的数据集上做这件事，所以我正在寻找一种有效的方法。

Answer 1

您可以使用re：

import re
s = 'MULTILINESTRING((10 10,10 40),(40 40,30 30,40 20,30 10))'
final_result = list(filter(None, [list(map(int, i.split())) for i in re.findall('[\d\s]+', s)]))

输出：

[[10, 10], [10, 40], [40, 40], [30, 30], [40, 20], [30, 10]]

Answer 2

如果您正在寻找不会做太多的干净代码，我建议采用涉及re模块的两步流程 -

使用str.split
对于每个块，使用re.findall

为了提高性能，我建议使用re.compile预编译正则表达式模式，因为我们将在循环内重复调用它。

>>> import re
>>> p = re.compile(r'\d+(?:\.\d+)?')
>>> [list(map(int, p.findall(x)) for x in mstring.split(',')]
[[10, 10], [10, 40], [40, 40], [30, 30], [40, 20], [30, 10]]

注意， mstring 是您的字符串数据。

<强>详情

\d+    # match one or more digits
(?:    # specify non-capturing group
\.     # literal period/decimal
\d+    
)?     # optional

从语义上讲，这个正则表达式将匹配整数OR浮点数（Ajax1234的解决方案目前仅考虑整数，并保证在更少的周期内完成搜索）。

从字符串中提取坐标

2 个答案: