这是我的第一篇文章。 我总是来这个论坛寻找代码的答案。
我一直在努力理解Python中的正则表达式,但它有点难。
我的文字看起来像这样:
Name: Clash1
Distance: -1.341m
Image Location: Test 1_navis_files\cd000001.jpg
HardStatus: New
Clash Point: 3.884m, -2.474m, 2.659m
Date Created: 2016/6/2422:45:09
Item 1
GUID: 6efaec51-b699-4d5a-b947-505a69c31d52
Path: File ->Colisiones_v2015.dwfx ->Segment ->Pipes (1) ->Pipe Types (1) ->Default (1) ->Pipe Types [2463] ->Shell
Item Name: Pipe Types [2463]
Item Type: Shell
Item 2
GUID: 6efaec51-b699-4d5a-b947-505a69c31dea
Path: File ->Colisiones_v2015.dwfx ->Segment ->Walls (4) ->Basic Wall (4) ->Wall 1 (4) ->Basic Wall [2343] ->Shell
Item Name: Basic Wall [2343]
Item Type: Shell
------------------
Name: Clash2
Distance: -1.341m
Image Location: Test 1_navis_files\cd000002.jpg
HardStatus: New
Clash Point: 3.884m, 3.533m, 2.659m
Date Created: 2016/6/2422:45:09
Item 1
GUID: 6efaec51-b699-4d5a-b947-505a69c31d52
Path: File ->Colisiones_v2015.dwfx ->Segment ->Pipes (1) ->Pipe Types (1) ->Default (1) ->Pipe Types [2463] ->Shell
Item Name: Pipe Types [2463]
Item Type: Shell
Item 2
GUID: 6efaec51-b699-4d5a-b947-505a69c31de8
Path: File ->Colisiones_v2015.dwfx ->Segment ->Walls (4) ->Basic Wall (4) ->Wall 1 (4) ->Basic Wall [2341] ->Shell
Item Name: Basic Wall [2341]
Item Type: Shell
------------------
我需要做的是创建一个列表,为每个文本块(由-------------------------------
分隔)提取以下内容作为字符串:冲突名称和冲突点。
例如:Clash 1 3.884, 3.533, 2.659
我对Python很陌生,对正则表达式真的不太了解。
有人能给我一些关于使用正则表达式从文本中提取这些值的线索吗?
我做了类似的事情:
exp = r'(?<=Clash Point\s)(?<=Point\s)([0-9]*)'
match = re.findall(exp, html)
if match:
OUT.append(match)
else:
OUT = 'fail'
但我知道我远离我的目标。
答案 0 :(得分:1)
如果您正在寻找正则表达式解决方案,您可以提出:
^Name:\s* # look for Name:, followed by whitespaces
# at the beginning of a line
(?P<name>.+) # capture the rest of the line
# in a group called "name"
[\s\S]+? # anything afterwards lazily
^Clash\ Point:\s* # same construct as above
(?P<point>.+) # same as the other group
<小时/>
转换为Python
代码,这将是:
import re
rx = re.compile(r"""
^Name:\s*
(?P<name>.+)
[\s\S]+?
^Clash\ Point:\s*
(?P<point>.+)""", re.VERBOSE|re.MULTILINE)
for match in rx.finditer(your_string_here):
print match.group('name')
print match.group('point')
这将输出:
Clash1
3.884m, -2.474m, 2.659m
Clash2
3.884m, 3.533m, 2.659m
答案 1 :(得分:0)
import re
lines = s.split('\n')
names = []
points = []
for line in lines:
result = re.search('^Name:\s*(\w+)', line)
if result:
names.append(result.group(1))
result = re.search('^Clash Point:\s*([-0-9m., ]+)',line)
if result:
points.append(result.group(1))
print(names)
print(points)
# if you need more nice output, you can use zip() function
for name, point in zip(names, points):
print(name, point)
您可以在regexr.com上找到有关正则表达式的有用信息。另外,我用它来快速测试和参考。