从Perl到Python的正则表达式转换

时间:2014-01-30 14:45:35

标签: python regex perl migration

我想将一个小的Perl程序重写为Python。 我用它处理文本文件如下:

输入:

00000001;Root;;
00000002;  Documents;;
00000003;    oracle-advanced_plsql.zip;file;
00000004;  Public;;
00000005;  backup;;
00000006;    20110323-JM-F.7z.001;file;
00000007;    20110426-JM-F.7z.001;file;
00000008;    20110603-JM-F.7z.001;file;
00000009;    20110701-JM-F-via-summer_school;;
00000010;      20110701-JM-F-yyy.7z.001;file;

期望的输出:

00000001;;Root;;
00000002;  ;Documents;;
00000003;    ;oracle-advanced_plsql.zip;file;
00000004;  ;Public;;
00000005;  ;backup;;
00000006;    ;20110323-JM-F.7z.001;file;
00000007;    ;20110426-JM-F.7z.001;file;
00000008;    ;20110603-JM-F.7z.001;file;
00000009;    ;20110701-JM-F-via-summer_school;;
00000010;      ;20110701-JM-F-yyy.7z.001;file;

以下是有效的Perl代码:

#filename: perl_regex.pl
#/usr/bin/perl -w
while(<>) {                                                           
  s/^(.*?;.*?)(\w)/$1;$2/;                                            
  print $_;                                                           
}      

从命令行调用它:perl_regex.pl input.txt

Perl风格的正则表达式的解释:

s/        # start search-and-replace regexp
  ^       # start at the beginning of this line
  (       # save the matched characters until ')' in $1
    .*?;  # go forward until finding the first semicolon
    .*?   # go forward until finding... (to be continued below)
  )
  (       # save the matched characters until ')' in $2
    \w    # ... the next alphanumeric character.
  )
/         # continue with the replace part
  $1;$2   # write all characters found above, but insert a ; before $2
/         # finish the search-and-replace regexp.

有谁能告诉我,如何在Python中获得相同的结果?特别是对于1美元和2美元的变量,我找不到类似的东西。

2 个答案:

答案 0 :(得分:2)

s / pattern / replace /在python正则表达式中的替换指令是re.sub(pattern,replace,string)函数,或re.compile(pattern).sub(replace,string)。在您的情况下,您将这样做:

_re_pattern = re.compile(r"^(.*?;.*?)(\w)")
result = _re_pattern.sub(r"\1;\2", line)

请注意,$1变为\1。至于perl,你需要以你想要的方式迭代你的行(open,inputfile,splitlines,......)。

答案 1 :(得分:1)

Python正则表达式与Perl非常相似,除了:

  • 在Python中没有正则表达式文字。它应该用字符串表示。我在以下代码中使用了r'raw string literal'
  • 反向引用表示为\1\2,..或\g<1>\g<2>,..
  • ...

使用re.sub替换。

import re
import sys

for line in sys.stdin: # Explicitly iterate standard input line by line
    # `line` contains trailing newline!
    line = re.sub(r'^(.*?;.*?)(\w)', r'\1;\2', line)
    #print(line) # This print trailing newline
    sys.stdout.write(line) # Print the replaced string back.