python re - 正则表达式从字符串

时间:2015-11-17 16:18:30

标签: python regex

我有一个如下所示的字符串

\ n这列出了所有当前挂载的文件系统的容量。\ n命令:\ nFile-System \ nMbytes \ n用于\ nAvail%已使用挂载在\ n / dev / vg00 / lvol3 \ n21.0g \ n312 \ n20.5g \ n2%/ \ n / dev / vg00 / lvol1 \ n2097 \ n511 \ n1573 \ n25%/stand\n/dev/vg00/lvol8\n41.9g\n7225\n34.4g\n17% / var \ n / dev / vg00 / lvol7 \ n21.0g \ n13.9g \ n6982 \ n67%/usr\n/dev/vgxyz/lvusr_xyz\n21.0g\n1558\n18.2g\n8% / usr / abc \ n / dev / vg00 / lvol6 \ n21.0g \ n5472 \ n15.4g \ n26%/ tmp \ n

我需要捕获如下字段:

文件系统Mbytes已使用可用%已使用Mounted_on

/ dev / vg00 / lvol3 21.0g 312 20.5g 2%/

/ dev / vg00 / lvol1 2097 511 1573 25%/ stand

请帮忙 我尝试了以下

rx_sequence = re.compile(r" ^。?\ n(/ dev /.?)\ n(\ d {1,}。?)\ n( ?\ d +)\ n(\ d {1,} +)\ n(\ d {1,}%)\ S()"??,re.DOTALL)

用于rx_sequence.finditer(str1)中的匹配:     print match.group(1)

的/ dev / vg00中/ lvol3

它只打印第一场比赛。

1 个答案:

答案 0 :(得分:0)

您可以按如下方式拆分数据:

data = """\nThis lists the capacity of all currently mounted filesystems.\nCommand:\nFile-System\nMbytes\nUsed\nAvail %Used Mounted on\n/dev/vg00/lvol3\n21.0g\n312\n20.5g\n2% /\n/dev/vg00/lvol1\n2097\n511\n1573\n25% /stand\n/dev/vg00/lvol8\n41.9g\n7225\n34.4g\n17% /var\n/dev/vg00/lvol7\n21.0g\n13.9g\n6982\n67% /usr\n/dev/vgxyz/lvusr_xyz\n21.0g\n1558\n18.2g\n8% /usr/abc\n/dev/vg00/lvol6\n21.0g\n5472\n15.4g\n26% /tmp\n"""

lines = data.splitlines()[7:]
entries = [entry for entry in zip(*([iter(lines)] * 5))]

for entry in entries:
    print entry

这将为您提供如下所示的元组条目列表:

('/dev/vg00/lvol3', '21.0g', '312', '20.5g', '2% /')
('/dev/vg00/lvol1', '2097', '511', '1573', '25% /stand')
('/dev/vg00/lvol8', '41.9g', '7225', '34.4g', '17% /var')
('/dev/vg00/lvol7', '21.0g', '13.9g', '6982', '67% /usr')
('/dev/vgxyz/lvusr_xyz', '21.0g', '1558', '18.2g', '8% /usr/abc')
('/dev/vg00/lvol6', '21.0g', '5472', '15.4g', '26% /tmp')

首先将数据拆分为行并跳过前几行。然后它一次读取5行,为每个条目创建一个元组。正则表达式会过度,因为所有条目都已使用换行符分隔。