我是python的新手,陷入了一个我无法解决的小问题。我尝试从一个字符串中提取2个坐标对,并陷入困境,因为该字符串没有像逗号一样的av公共分隔符。
我的字符串如下:
&BBOX=151406.25%2C6579062.5%2C151875%2C6579531.25&
&BBOX=156298.828125%2C6576689.453125%2C156328.125%2C6576718.75
&BBOX=156328.125,6576806.640625%2C156357.421875%2C6576835.9375
&BBOX=156328.125,6576748.046875,156357.421875,6576777.34375& ?BBOX=156328%2C125%2C6576777%2C34375%2C156357%2C421875%2C6576806%2C640625&
&BBOX=156269.53125%2C6576689.453125%2C156298.828125%2C6576718.75&
&BBOX=156298.828125%2C6576718.75%2C156328.125%2C6576748.046875
?BBOX=156386.71875%2C6576806.640625%2C156416.015625%2C6576835.9375&
每个字符串都以"BBOX="
开头,之后有4个坐标。 x_min
,y_min
,x_max
和y_max
。我使用"BBOX="
来查找我的坐标在更长的字符串中的位置
x_min
和x_max
应该是6位数字,而y_min
和y_max
应该是7位数字。
它们可以是浮点值或整数值。
我认为我会在之前将坐标拆分为一个部分。和之后。但我真的想知道那是不是要走的路
现在我的正则表达式如下:
rexp_bbox = r"(^.+BBOX=(?P<bbox_xmin_before>\d.*?)[.,%&\s](?P<bbox_xmin_after>.*?)[.,%2C&\s](?P<bbox_ymin_before>\d.*?)[.,%&\s](?P<bbox_ymin_after>.*?)[.,%&\s](?P<bbox_xmax_before>\d.*?)[.,%&\s](?P<bbox_xmax_after>.*?)[.,%&\s](?P<bbox_ymax_before>\d.*?)[.,%&\s](?P<bbox_ymax_after>.*?)[.,%&\s])"
您将如何构造正则表达式来提取两个坐标对?
答案 0 :(得分:1)
模式"(?:.*BBOX=)(\d{6}(?:\.?[\d]*))(?:%2C|,)(\d{7}(?:\.?[\d]*))(?:%2C|,)(\d{6}(?:\.?[\d]*))(?:%2C|,)(\d{7}(?:\.?[\d]*))"
有效,并将坐标提取为4组。组1 = min_x
,组2 = min_y
,组3 = max_x
,组4 = max_y
。
以下代码显示了运行中的模式:
import re
orig_coords = [
'&BBOX=151406.25%2C6579062.5%2C151875%2C6579531.25&',
'&BBOX=156298.828125%2C6576689.453125%2C156328.125%2C6576718.75',
'&BBOX=156328.125,6576806.640625%2C156357.421875%2C6576835.9375',
'&BBOX=156328.125,6576748.046875,156357.421875,6576777.34375&',
'?BBOX=156328%2C125%2C6576777%2C34375%2C156357%2C421875%2C6576806%2C640625&',
'&BBOX=156269.53125%2C6576689.453125%2C156298.828125%2C6576718.75&',
'&BBOX=156298.828125%2C6576718.75%2C156328.125%2C6576748.046875',
'?BBOX=156386.71875%2C6576806.640625%2C156416.015625%2C6576835.9375&'
]
bbox_start = "(?:.*BBOX=)"
separator = "(?:%2C|,)"
coord_6 = "(\d{6}(?:\.?[\d]*))"
coord_7 = "(\d{7}(?:\.?[\d]*))"
regex_str = bbox_start + coord_6 + separator + coord_7 + separator + coord_6 + separator + coord_7
reg = re.compile(regex_str)
for c in orig_coords:
r = reg.match(c)
if r:
print('Coordinates for {}'.format(c))
print('x_min: {} x_max: {}'.format(r.group(1), r.group(3)))
print('y_min: {} y_max: {}'.format(r.group(2), r.group(4)))
else:
print('No match for {}'.format(c))
输出:
Coordinates for &BBOX=151406.25%2C6579062.5%2C151875%2C6579531.25&
x_min: 151406.25 x_max: 151875
y_min: 6579062.5 y_max: 6579531.25
Coordinates for &BBOX=156298.828125%2C6576689.453125%2C156328.125%2C6576718.75
x_min: 156298.828125 x_max: 156328.125
y_min: 6576689.453125 y_max: 6576718.75
Coordinates for &BBOX=156328.125,6576806.640625%2C156357.421875%2C6576835.9375
x_min: 156328.125 x_max: 156357.421875
y_min: 6576806.640625 y_max: 6576835.9375
Coordinates for &BBOX=156328.125,6576748.046875,156357.421875,6576777.34375&
x_min: 156328.125 x_max: 156357.421875
y_min: 6576748.046875 y_max: 6576777.34375
No match for ?BBOX=156328%2C125%2C6576777%2C34375%2C156357%2C421875%2C6576806%2C640625&
Coordinates for &BBOX=156269.53125%2C6576689.453125%2C156298.828125%2C6576718.75&
x_min: 156269.53125 x_max: 156298.828125
y_min: 6576689.453125 y_max: 6576718.75
Coordinates for &BBOX=156298.828125%2C6576718.75%2C156328.125%2C6576748.046875
x_min: 156298.828125 x_max: 156328.125
y_min: 6576718.75 y_max: 6576748.046875
Coordinates for ?BBOX=156386.71875%2C6576806.640625%2C156416.015625%2C6576835.9375&
x_min: 156386.71875 x_max: 156416.015625
y_min: 6576806.640625 y_max: 6576835.9375
您可以自己运行代码on repl.it。
无法使用此模式的一个坐标似乎未遵循您在问题中发布的规则。
答案 1 :(得分:0)
a = "&BBOX=151406.25%2C6579062.5%2C151875%2C6579531.25&"
ans = a.split('=')[1].split('&')[0].split('%')
在这里拆分可能会有用,而不是复杂的正则表达式,但这还取决于您完全拥有哪种字符串。
答案 2 :(得分:0)
类似的事情似乎也可行;尚不确定此和Jim Wright的答案之间是否存在任何运行时差异。
import re
coords = ["&BBOX=151406.25%2C6579062.5%2C151875%2C6579531.25&",
"&BBOX=156298.828125%2C6576689.453125%2C156328.125%2C6576718.75",
"&BBOX=156328.125,6576806.640625%2C156357.421875%2C6576835.9375",
"&BBOX=156328.125,6576748.046875,156357.421875,6576777.34375& ?BBOX=156328%2C125%2C6576777%2C34375%2C156357%2C421875%2C6576806%2C640625&",
"&BBOX=156269.53125%2C6576689.453125%2C156298.828125%2C6576718.75&",
"&BBOX=156298.828125%2C6576718.75%2C156328.125%2C6576748.046875",
"?BBOX=156386.71875%2C6576806.640625%2C156416.015625%2C6576835.9375&"]
r = re.compile(r"&BBOX=(.+?)(?=&|$)")
x_coords = []
def split_coords(coords_string):
if "%2C" in coords_string:
bbox = coords_string.split('%2C')
else:
bbox = coords_string.split(",")
x_min, x_max = [bbox[0], bbox[2]]
return (x_min, x_max)
# If a match is found using the regex, split the coords and add the x_min and x_max coords to the x_coords array
for i in coords:
match = r.match(i)
if match:
match = match.group(1)
x_coords.append(split_coords(match))
答案 3 :(得分:0)
您的评论对让我以另一种方式思考非常有帮助。我只是没有注意到%2C是坐标之间的常见分隔符。我将正则表达式修改为:
rexp_bbox = r“(^。+ BBOX =(?P \ d。?)(%2C)(?P \ d。?)(%2C)(?P \ d。 ?)(%2C)(?P \ d。?)(\ s |&| \“))”
它能解决问题,因为我在日志文件解析中使用正则表达式,其中我计算了某些边界框的数量(我的问题中的坐标是边界框的角坐标)