如何在Python中使用re()并在“ If”语句中返回捕获组?

时间:2019-03-03 08:17:32

标签: python regex

尽管我已经使用Perl多年了,但是除了在语言中对正则表达式的基本使用以外,我一直遇到很多麻烦。这是 只是现在情况更糟了,因为我正在尝试学习Python ...而对我来说,re()的使用甚至还不清楚。

我正在尝试使用re()检查子字符串是否在字符串中是否匹配 并且正在使用捕获组从匹配过程中提取一些信息。但是,我无法在几个工作中 上下文使用re()调用并分配所有返回值时 在“ if”语句中..以及如何处理未定义.groups项目的情况 在匹配对象中(不进行匹配时)。

因此,以下是我正在尝试用Perl和Python进行编码的示例,以及它们各自的输出。

对于使用Python如何更好地解决问题的建议,我将不胜感激。

Perl代码:

use strict;
use warnings;

my ($idx, $dvalue);

while (my $rec = <DATA>) {
   chomp($rec);
   if ( ($idx, $dvalue) = ($rec =~ /^XA([0-9]+)=(.*?)!/) ) {
      printf("  Matched:\n");
      printf("    rec: >%s<\n", $rec);
      printf("    index = >%s<  value = >%s<\n", $idx, $dvalue);

   } elsif ( ($idx, $dvalue) = ($rec =~ /^PZ([0-9]+)=(.*?[^#])!/) ) {
      printf("  Matched:\n");
      printf("    rec: >%s<\n", $rec);
      printf("    index = >%s<  value = >%s<\n", $idx, $dvalue);

   } else {
      printf("\n  Unknown Record format, \\%s\\\n\n", $rec);

   }
}
close(DATA);

exit(0)      

__DATA__
DUD=ABC!QUEUE=D23!
XA32=7!P^=32!
PZ112=123^!PQ=ABC!

Perl输出:

  Unknown Record format, \DUD=ABC!QUEUE=D23!\

  Matched:
    rec: >XA32=7!P^=32!<
    index = >32<  value = >7<
  Matched:
    rec: >PZ112=123^!PQ=ABC!<
    index = >112<  value = >123^<

Python代码:

import re

string = 'XA32=7!P^=32!'

with open('data.dat', 'r') as fh:
   for rec in fh:
      orec = '    rec: >' + rec.rstrip('\n') + '<'
      print(orec)

      # always using 'string' at least lets this program run          
      (index, dvalue) = re.search(r'^XA([0-9]+)=(.*?[^#])!', string).groups()

      # The following works when there is a match... but fails with an error when
      # a match is NOT found, viz:-
      # ...    
      #     (index, dvalue) = re.search(r'^XA([0-9]+)=(.*?[^#])!', rec).groups()
      #
      #   Traceback (most recent call last):
      #     File "T:\tmp\a.py", line 13, in <module>
      #       (index, dvalue) = re.search(r'^XA([0-9]+)=(.*?[^#])!', rec).groups()
      #   AttributeError: 'NoneType' object has no attribute 'groups'
      #

      buf = '    index = >' + index + '<' + '  value = >' + dvalue + '<'     
      print(buf)

exit(0)      

data.dat内容:

DUD=ABC!QUEUE=D23!
XA32=7!P^=32!
PZ112=123^!PQ=ABC!

Python输出:

    rec: >DUD=ABC!QUEUE=D23!<
    index = >32<  value = >7<
    rec: >XA32=7!P^=32!<
    index = >32<  value = >7<
    rec: >PZ112=123^!PQ=ABC!<
    index = >32<  value = >7<

另一项进展::还有更多代码可帮助我更好地理解这一点……但是我不确定何时/如何使用match.group()match.groups() ..

Python代码:

import re

rec = 'XA22=11^!S^=64!ABC=0,0!PX=0!SP=12B!'
print("rec = >{}<".format(rec))

# ----

index = 0 ; dvalue = 0 ; x = 0 
match = re.match(r'XA([0-9]+)=(.*?[^#])!(.*?)!', rec) 
if match:
   (index, dvalue, x) = match.groups()
   print("3 ():  index = >{}< value = >{}< x = >{}<".format(index, dvalue, x))

# ----

index = 0 ; dvalue = 0 ; x = 0 
match = re.match(r'XA([0-9]+)=(.*?[^#])!', rec) 
if match:
   (index, dvalue) = match.groups()
   print("2 ():  index = >{}< value = >{}< x = >{}<".format(index, dvalue, x))

# ----

index = 0 ; dvalue = 0 ; x = 0 
match = re.match(r'XA([0-9]+)=', rec) 
if match:
    #(index) = match.groups()  # Why doesn't this work like above examples!?
   (index, ) = match.groups()  # ...and yet this works!?
                               # Does match.groups ALWAYS returns a tuple!?
   #(index) = match.group(1)    # This also works; 0 = entire matched string?
   print("1 ():  index = >{}< value = >{}< x = >{}<".format(index, dvalue, x))

# ----

index = 0 ; dvalue = 0 ; x = 0 
match = re.search(r'S\^=([0-9]+)!', rec) 
if match:
   (index, ) = match.groups()  # Returns tuple(?!)
   print("1 ():  index = >{}< value = >{}< x = >{}<".format(index, dvalue, x))

再次,我希望您能想到哪种是“首选”方式,或者是否还有另一种方式来与小组打交道。

1 个答案:

答案 0 :(得分:2)

您需要先检查匹配项,然后再使用组。即

  • 编译正则表达式(根据文档,对于当今大多数情况是可选的)
  • 将每个正则表达式应用于字符串以生成匹配对象
    • match()仅在字符串的开头匹配,即与隐式^锚点匹配
    • search()匹配字符串中的任何地方
  • 检查匹配对象是否有效
    • 提取组
    • 跳到下一个循环迭代
# works with Python 2 and Python 3
import re

with open('dummy.txt', 'r') as fh:
    for rec in fh:
        orec = '    rec: >' + rec.rstrip('\n') + '<'
        print(orec)

        match = re.match(r'XA([0-9]+)=(.*?[^#])!', rec)
        if match:
            (index, dvalue) = match.groups()
            print("    index = >{}<  value = >{}<".format(index, dvalue))
            continue

        match = re.match(r'PZ([0-9]+)=(.*?[^#])!', rec)
        if match:
            (index, dvalue) = match.groups()
            print("    index = >{}<  value = >{}<".format(index, dvalue))
            continue

        print("    Unknown Record format")

输出:

$ python dummy.py
    rec: >DUD=ABC!QUEUE=D23!<
    Unknown Record format
    rec: >XA32=7!P^=32!<
    index = >32<  value = >7<
    rec: >PZ112=123^!PQ=ABC!<
    index = >112<  value = >123^<

但是我想知道为什么您不简化Perl和Python代码以仅使用单个正则表达式呢?例如:

match = re.match(r'(?:XA|PZ)([0-9]+)=(.*?[^#])!', rec)
if match:
    (index, dvalue) = match.groups()
    print("    index = >{}<  value = >{}<".format(index, dvalue))
else:
    print("    Unknown Record format")