Question

我只是用python regexp把我的头发拉出来。

我有一个字符串，其中包含os命令的多行输出。

其中一行将包含如下字符串：

2015/04 / 13.16：26：07 156.0 GB复制的实例数据，dev_iosecs 1887，dev_iorate 88.8 MB / s

我想要解析＆＃34; 156.0 GB＆＃34;分为两个匹配组。这个字段也可以包含TB，MB，KB，甚至可能只包含字节，但是现在我只想关注TB，MB和KB，并且我将处理潜在的情况，如果它只是字节后面的字节就产生了。

    if self.type == "cpinstance":
        if re.search("of instance data copied", line):
            m = re.match("(?P<datasize>\d[.][\d]) (?P<units>TB|GB|MB|KB) of instance data copied", line)
            print m.group('datasize'), m.group('units')
            if m.group('units') == "GB":
                print "MATCH!!!!!"

我已经尝试过几十次正则表达式的排列，并且在我的生活中无法让m.group继续工作。

Traceback (most recent call last):
  File "./listInstances.py", line 187, in <module>
    tscript = OSBTranscript(image.jobid)
  File "/devel/REPO/PYLIB/osb.py", line 833, in __init__
    print m.group('datasize'), m.group('units')
AttributeError: 'NoneType' object has no attribute 'group'

我确定这是一个愚蠢的东西，直视着我，但目前正在逃避我。 = P

感谢您的帮助。

Answer 1

match始终从行的开头开始，因此在看到日期和时间部分时会失败。尝试使用search代替match。

import re

line = "2015/04/13.16:26:07 156.0 GB of instance data copied, dev_iosecs 1887, dev_iorate 88.8 MB/s"

if re.search("of instance data copied", line):
    m = re.search("(?P<datasize>\d[.][\d]) (?P<units>TB|GB|MB|KB) of instance data copied", line)
    print m.group('datasize'), m.group('units')
    if m.group('units') == "GB":
        print "MATCH!!!!!"

结果：

6.0 GB
MATCH!!!!!

良好的开端，但它只匹配小数点前的一位数。尝试在\d后放置一颗星星。（或者可能是一个加号，取决于你是否想找到像“.5”这样的数字。）

import re

line = "2015/04/13.16:26:07 156.0 GB of instance data copied, dev_iosecs 1887, dev_iorate 88.8 MB/s"

if re.search("of instance data copied", line):
    m = re.search("(?P<datasize>\d*[.][\d]) (?P<units>TB|GB|MB|KB) of instance data copied", line)
    print m.group('datasize'), m.group('units')
    if m.group('units') == "GB":
        print "MATCH!!!!!"

结果：

156.0 GB
MATCH!!!!!

Answer 2

re.match()匹配字符串的开头，您需要使用re.search()查找正则表达式模式产生匹配的第一个位置...

>>> import re
>>> s = '2015/04/13.16:26:07 156.0 GB of instance data copied, dev_iosecs 1887, dev_iorate 88.8 MB/s'
>>> m = re.search(r'(?P<datasize>\d+(?:\.\d+)?) (?P<units>[TGMK]B)', s)
>>> print m.group('datasize'), m.group('units')

156.0 GB

注意： <datasize>命名组内的正则表达式未按预期匹配。你需要一个量词来捕捉整个模式，所以我修改它以允许它。

在重新匹配组时遇到问题

2 个答案: