无法在python中找到正确的正则表达式

时间:2012-12-19 18:15:13

标签: python regex

我搜索了几个小时,但我找不到正确的正则表达式来匹配一个简单的模式。 使用此文本(它是Volume group列表中逻辑卷的stdout):

rootvg:
hd5                 boot       1     1     1    closed/syncd  N/A
hd4                 jfs        38    38    1    open/syncd    /
datavg:
data01lv            jfs        7     7     1    open/syncd    /data1
data02lv            jfs        7     7     1    open/syncd    /data2

我希望从我的正则表达式中找到那种结果(例如regex.findall(text)):

    [(u'rootvg', u'hd5 boot 1 1 1 closed/syncd N/A\nhd4 jfs 38 38 1 open/syncd /\n'),(u'datavg', u'data01lv jfs 7 7 1 open/syncd /data1\ndata02lv jfs 7 7 1 open/syncd /data2')]

但我能拥有的最好的就是这种模式:^(?P<vgname>\w+):\s(?P<lv>[\w\s\.\_\/-]+)+ 结果与findall:

[(u'rootvg', u'hd5 boot 1 1 1 closed/syncd N/A\nhd4 jfs 38 38 1 open/syncd /\ndatavg')]

2 个答案:

答案 0 :(得分:4)

尝试以下方法:

re.findall(r'^(\w+):(.*?)(?=^\w+:|\Z)', text, flags=re.DOTALL | re.MULTILINE)

示例:

>>> text = '''rootvg:
... hd5                 boot       1     1     1    closed/syncd  N/A
... hd4                 jfs        38    38    1    open/syncd    /
... datavg:
... data01lv            jfs        7     7     1    open/syncd    /data1
... data02lv            jfs        7     7     1    open/syncd    /data2'''
>>> re.findall(r'^(\w+):(.*?)(?=^\w+:|\Z)', text, flags=re.DOTALL | re.MULTILINE)
[('rootvg', '\nhd5                 boot       1     1     1    closed/syncd  N/A\nhd4                 jfs        38    38    1    open/syncd    /\n'), ('datavg', '\ndata01lv            jfs        7     7     1    open/syncd    /data1\ndata02lv            jfs        7     7     1    open/syncd    /data2')]

re.DOTALL标记使得.可以匹配换行符,而re.MULTILINE标记使得^$可以匹配分别是行的开头和结尾,而不仅仅是字符串的开头和结尾。

说明:

^            # match at the start of a line
(\w+)        # match one or more letters or numbers and capture in group 1
:            # match a literal ':'
(.*?)        # match zero or more characters, as few as possible
(?=          # start lookahead (only match if following regex can match)
   ^\w+:       # start of line followed by word characters then ':'
   |           # OR
   \Z          # end of the string
)            # end lookahead

或者,您可以使用re.split()使用更简单的正则表达式来获得类似的输出,将其转换为您需要的格式应该不会太难:

>>> re.split(r'^(\w+):', text, flags=re.MULTILINE)
['', 'rootvg', '\nhd5                 boot       1     1     1    closed/syncd  N/A\nhd4                 jfs        38    38    1    open/syncd    /\n', 'datavg', '\ndata01lv            jfs        7     7     1    open/syncd    /data1\ndata02lv            jfs        7     7     1    open/syncd    /data2']

以下是您可以将其转换为所需格式的方法:

>>> matches = re.split(r'^(\w+):', text, flags=re.MULTILINE)
>>> [(v, matches[i+1]) for i, v in enumerate(matches) if i % 2]
[('rootvg', '\nhd5                 boot       1     1     1    closed/syncd  N/A\nhd4                 jfs        38    38    1    open/syncd    /\n'), ('datavg', '\ndata01lv            jfs        7     7     1    open/syncd    /data1\ndata02lv            jfs        7     7     1    open/syncd    /data2')]

答案 1 :(得分:0)

#!/usr/bin/env python

"""
    Demo code for Stackoverflow question:
    http://stackoverflow.com/questions/13958548/unable-to-find-the-correct-regex-in-python#13958634
"""

import StringIO

text = """
rootvg:
hd5                 boot       1     1     1    closed/syncd  N/A
hd4                 jfs        38    38    1    open/syncd    /
datavg:
data01lv            jfs        7     7     1    open/syncd    /data1
data02lv            jfs        7     7     1    open/syncd    /data2
"""


def gen_lines(text):    
    """ yield non-blank lines in input """
    for line in text:
        if line.strip():
            yield line

def gen_groups(text):
    group = None
    data = []
    for line in gen_lines(text):

        # We found a new group label
        if len(line.split()) == 1 and line.strip().endswith(':'):
            if group:
                yield group, data
            group = line.strip()[:-1]
            data = []

        # We found a data line
        elif group:
            data.append(line.split())

    # We're done with input; yield final group
    else:
        if group:
            yield group, data

def main():

    # Mimics behavior of mock_file = open('input.txt')
    mock_file = StringIO.StringIO(text)

    for group, data in gen_groups(mock_file):
        print group
        for d in data:
            print d

main() 

输出:

rootvg
['hd5', 'boot', '1', '1', '1', 'closed/syncd', 'N/A']
['hd4', 'jfs', '38', '38', '1', 'open/syncd', '/']
datavg
['data01lv', 'jfs', '7', '7', '1', 'open/syncd', '/data1']
['data02lv', 'jfs', '7', '7', '1', 'open/syncd', '/data2']