Django中的正则表达式

时间:2015-01-08 16:01:00

标签: python regex django

我将从查询中获取输出,如:

[ (14577692L, 'POINT(-122.106035882 37.397386475)'), (14577692L, 'POINT(-122.106035882 37.397386475)'), (14577692L, 'POINT(-122.106035882 37.397386475)') ]

我想单独获取POINT值,以使用正则表达式(例如。

)获取lat和long值
_RE = re.compile('\(\([\d\-\., ]*\)\)')
for i in cursor.fetchall():
    for p in _RE.findall(i[1]):
        // I want latitude and longitude value from POINT(-122.106035882 37.397386475) 

我的正则表达式错了。有人可以帮我纠正这个:

_RE = re.compile('\(\([\d\-\., ]*\)\)'))

3 个答案:

答案 0 :(得分:5)

这不需要正则表达式。因为POINT()的格式是静态的,所以您可以简单地切出包含坐标的字符串部分并将其拆分为空格:

 resultset = [
    (14577692L, 'POINT(-122.106035882 37.397386475)'),
    (14577692L, 'POINT(-122.106035882 37.397386475)'),
    (14577692L, 'POINT(-122.106035882 37.397386475)')
]

for row in resultset:
    coordinatestring = row[1][6:-1]
    lat, lon = (float(x) for x in coordinatestring.split(' '))
    do_something_with(lat, lon)

切片符号[6:-1]省略了原始字符串的前6个字符和最后一个字符,分别为POINT()。这留下了两个用空格分隔的数字,这很容易处理,如上所述。

如果绝对必须使用正则表达式,则应使用原始字符串以避免必须两次转义字符,并使用两个捕获组,以便区分第一个和第二个坐标:

>>> import re
>>> _RE = re.compile(r'POINT\(([-\d\.]+)\s([-\d\.]+)\)')
>>> _RE.groups
2
>>> _RE.search('POINT(-122.106035882 37.397386475)').groups()
('-122.106035882', '37.397386475')

尽管如此,即使是正则表达式也是过度的。既然你知道POINT()的格式是静态的,你可以自己查找值,忽略字母和parens:

>>> _RE = re.compile(r'([-\d\.]+)\s([-\d\.]+)')
>>> _RE.search('POINT(-122.106035882 37.397386475)').groups()
('-122.106035882', '37.397386475')

此时它变得足够简单,指出你根本不需要正则表达式的可能性(我已经展示过)。质疑使用re的必要性并考虑更简单的替代方案,这绝不是一个坏主意。

答案 1 :(得分:2)

更明确:

import re
p = re.compile(r"POINT\(([-\d\.]+)\s([-\d\.]+)\)")

data = [
(14577692L, 'POINT(-122.106035882 37.397386475)'),
(14577692L, 'POINT(-122.106035882 37.397386475)'),
(14577692L, 'POINT(-122.106035882 37.397386475)')
]

for record in data:
    lat, lon = p.search(record[1]).groups()
    print lat, lon

结果:

-122.106035882 37.397386475
-122.106035882 37.397386475
-122.106035882 37.397386475

您还可以获取包含命名变量的字典:

p = re.compile(r"POINT\((?P<lat>[-\d\.]+)\s(?P<lon>[-\d\.]+)\)")
...
for record in data:
    coordinates = p.match(record[1]).groupdict()
    print coordinates

结果:

{'lat': '-122.106035882', 'lon': '37.397386475'}
{'lat': '-122.106035882', 'lon': '37.397386475'}
{'lat': '-122.106035882', 'lon': '37.397386475'}

答案 2 :(得分:0)

POINT\((-?\d+(?:\.\d+)?)\s+(-?\d+(?:\.\d+)?)\)

试试这个。看看演示。

https://regex101.com/r/sH8aR8/32

import re
p = re.compile(r'POINT\((-?\d+(?:\.\d+)?)\s+(-?\d+(?:\.\d+)?)\)', re.IGNORECASE | re.DOTALL)
test_str = "[ (14577692L, 'POINT(-122.106035882 37.397386475)'), (14577692L, 'POINT(-122.106035882 37.397386475)'), (14577692L, 'POINT(-122.106035882 37.397386475)') ]"

re.findall(p, test_str)