我搜索了几个小时,但我找不到正确的正则表达式来匹配一个简单的模式。 使用此文本(它是Volume group列表中逻辑卷的stdout):
rootvg:
hd5 boot 1 1 1 closed/syncd N/A
hd4 jfs 38 38 1 open/syncd /
datavg:
data01lv jfs 7 7 1 open/syncd /data1
data02lv jfs 7 7 1 open/syncd /data2
我希望从我的正则表达式中找到那种结果(例如regex.findall(text)):
[(u'rootvg', u'hd5 boot 1 1 1 closed/syncd N/A\nhd4 jfs 38 38 1 open/syncd /\n'),(u'datavg', u'data01lv jfs 7 7 1 open/syncd /data1\ndata02lv jfs 7 7 1 open/syncd /data2')]
但我能拥有的最好的就是这种模式:^(?P<vgname>\w+):\s(?P<lv>[\w\s\.\_\/-]+)+
结果与findall:
[(u'rootvg', u'hd5 boot 1 1 1 closed/syncd N/A\nhd4 jfs 38 38 1 open/syncd /\ndatavg')]
答案 0 :(得分:4)
尝试以下方法:
re.findall(r'^(\w+):(.*?)(?=^\w+:|\Z)', text, flags=re.DOTALL | re.MULTILINE)
示例:
>>> text = '''rootvg:
... hd5 boot 1 1 1 closed/syncd N/A
... hd4 jfs 38 38 1 open/syncd /
... datavg:
... data01lv jfs 7 7 1 open/syncd /data1
... data02lv jfs 7 7 1 open/syncd /data2'''
>>> re.findall(r'^(\w+):(.*?)(?=^\w+:|\Z)', text, flags=re.DOTALL | re.MULTILINE)
[('rootvg', '\nhd5 boot 1 1 1 closed/syncd N/A\nhd4 jfs 38 38 1 open/syncd /\n'), ('datavg', '\ndata01lv jfs 7 7 1 open/syncd /data1\ndata02lv jfs 7 7 1 open/syncd /data2')]
re.DOTALL
标记使得.
可以匹配换行符,而re.MULTILINE
标记使得^
和$
可以匹配分别是行的开头和结尾,而不仅仅是字符串的开头和结尾。
说明:
^ # match at the start of a line
(\w+) # match one or more letters or numbers and capture in group 1
: # match a literal ':'
(.*?) # match zero or more characters, as few as possible
(?= # start lookahead (only match if following regex can match)
^\w+: # start of line followed by word characters then ':'
| # OR
\Z # end of the string
) # end lookahead
或者,您可以使用re.split()
使用更简单的正则表达式来获得类似的输出,将其转换为您需要的格式应该不会太难:
>>> re.split(r'^(\w+):', text, flags=re.MULTILINE)
['', 'rootvg', '\nhd5 boot 1 1 1 closed/syncd N/A\nhd4 jfs 38 38 1 open/syncd /\n', 'datavg', '\ndata01lv jfs 7 7 1 open/syncd /data1\ndata02lv jfs 7 7 1 open/syncd /data2']
以下是您可以将其转换为所需格式的方法:
>>> matches = re.split(r'^(\w+):', text, flags=re.MULTILINE)
>>> [(v, matches[i+1]) for i, v in enumerate(matches) if i % 2]
[('rootvg', '\nhd5 boot 1 1 1 closed/syncd N/A\nhd4 jfs 38 38 1 open/syncd /\n'), ('datavg', '\ndata01lv jfs 7 7 1 open/syncd /data1\ndata02lv jfs 7 7 1 open/syncd /data2')]
答案 1 :(得分:0)
#!/usr/bin/env python
"""
Demo code for Stackoverflow question:
http://stackoverflow.com/questions/13958548/unable-to-find-the-correct-regex-in-python#13958634
"""
import StringIO
text = """
rootvg:
hd5 boot 1 1 1 closed/syncd N/A
hd4 jfs 38 38 1 open/syncd /
datavg:
data01lv jfs 7 7 1 open/syncd /data1
data02lv jfs 7 7 1 open/syncd /data2
"""
def gen_lines(text):
""" yield non-blank lines in input """
for line in text:
if line.strip():
yield line
def gen_groups(text):
group = None
data = []
for line in gen_lines(text):
# We found a new group label
if len(line.split()) == 1 and line.strip().endswith(':'):
if group:
yield group, data
group = line.strip()[:-1]
data = []
# We found a data line
elif group:
data.append(line.split())
# We're done with input; yield final group
else:
if group:
yield group, data
def main():
# Mimics behavior of mock_file = open('input.txt')
mock_file = StringIO.StringIO(text)
for group, data in gen_groups(mock_file):
print group
for d in data:
print d
main()
rootvg
['hd5', 'boot', '1', '1', '1', 'closed/syncd', 'N/A']
['hd4', 'jfs', '38', '38', '1', 'open/syncd', '/']
datavg
['data01lv', 'jfs', '7', '7', '1', 'open/syncd', '/data1']
['data02lv', 'jfs', '7', '7', '1', 'open/syncd', '/data2']