如何在Perl中将变量作为递归正则表达式?

时间:2016-04-19 18:09:33

标签: regex perl recursion nested

我正在为John Tromp's Binary Lambda Calculus写一个简单的翻译器给De Bruijn Notation Lambda Calculus,以便我能理解他的Lambda文件在他的2012 "Most Functional" International Obfuscated C Code winner

中是如何工作的

这是翻译前primes.blc语言的一个示例:

00010001100110010100011010000000010110000010010001010111110111101001000110100001110011010000000000101101110011100111111101111000000001111100110111000000101100000110110

我在Bruijn.pl的primes.txt文件保存部分之前在注释行中遇到嵌套正则表达式的问题:

#!/usr/bin/env perl
#use strict;
use warnings;
use IO::File;
use Cwd; my $originalCwd = getcwd()."/";
#primes.blc as argument for test conversion
#______________________________________________________________________open file
my ($name) = @ARGV;
$FILE = new IO::File;
$FILE->open("< ".$originalCwd."primes.blc") || die("Could not open file!");
#$FILE->open("< ".$name) || die("Could not open file!");
while (<$FILE>){ $field .= $_; }
$FILE->close;
#______________________________________________________________________Translate
$field =~ s/(00|01|(1+0))/$1 /gsm;
$field =~ s/00 /\\ /gsm;
$field =~ s/01 /(a /gsm;
$field =~ s/(1+)0 /length($1)." "/gsme;

$RecursParenthesesRegex = m/\(([^()]+|(??{$RecursParenthesesRegex}))*\)/;
#$field =~ 1 while s/(\(a){1}(([\s\\]+?(\d+|$RecursParenthesesRegex)){2})/\($2\)/sm;
#______________________________________________________________________save file
#$fh = new IO::File "> ".$name;
$fh = new IO::File "> ".$originalCwd."primes.txt";
if (defined $fh) { print $fh $field; $fh->close; }

翻译文件primes.txt应该是什么:

\ (\ (1 (1 ((\ (1 1) \ \ \ ((1 \ \ 1) (\ (((4 4) 1) (\ (1 1) \ (2 (1 1)))) \ \ \ \ ((1 3) (2 (6 4)))))) \ \ \ (4 (1 3))))) \ \ ((1 \ \ 2) 2))

目前,该行已注释掉,它转换为几乎可读的格式,如下所示:

\ (a \ (a 1 (a 1 (a (a \ (a 1 1 \ \ \ (a (a 1 \ \ 1 (a \ (a (a (a 4 4 1 (a \ (a 1 1 \ (a 2 (a 1 1 \ \ \ \ (a (a 1 3 (a 2 (a 6 4 \ \ \ (a 4 (a 1 3 \ \ (a (a 1 \ \ 2 2 

哪个需要找到(a的最里面的抽象,以及2个数字或匹配的括号及其所有内容,并插入尾随)并将a一直删除到最外层的申请。

2 个答案:

答案 0 :(得分:2)

虽然我不理解你的算法,但这行很可疑

$RecursParenthesesRegex = m/\(([^()]+|(??{$RecursParenthesesRegex}))*\)/

您根据包含它的模式是否与$_匹配来定义未声明的变量

use strict意图抓住这样的错误,但不是修复错误而是将其关闭。这不明智

我猜您正在尝试定义递归模式,因此您需要使用qr//代替m//,并在模式中使用(?0)(?R)

让我们称之为$re而不是吗?喜欢这个

my $re = qr/\(([^()]+|(?R))*\)/

此外,这条线是奇数

$field =~ 1 while s/(\(a){1}(([\s\\]+?(\d+|$RecursParenthesesRegex)){2})/\($2\)/sm

$field的值与正则表达式模式1进行比较,只要替换在$_

中发生变化,就会丢弃结果

除此之外,如果没有对算法的描述以及您的代码与它的关系,我无法帮助您

答案 1 :(得分:2)

您可能需要像这样的正则表达式

 # (\(a)(([\s\\]*?(?:\d+|(?&RecursParens))){2})(?(DEFINE)(?<RecursParens>(?>\((?>(?>[^()]+)|(?:(?=.)(?&RecursParens)|))+\))))

 ( \(a )                       # (1)
 (                             # (2 start)
      (                             # (3 start)
           [\s\\]*? 
           (?:
                \d+ 
             |  
                (?&RecursParens) 
           )
      ){2}                          # (3 end)
 )                             # (2 end)

 (?(DEFINE)

      (?<RecursParens>              # (4 start)
           (?>
                \(
                (?>
                     (?> [^()]+ )
                  |  (?:
                          (?= . )
                          (?&RecursParens) 
                       |  
                     )
                )+
                \)
           )
      )                             # (4 end)
 )

使用像这样的Perl代码

use strict;
use warnings;
use feature qw{say};

my $field = "00010001100110010100011010000000010110000010010001010111110111101001000110100001110011010000000000101101110011100111111101111000000001111100110111000000101100000110110";

$field =~ s/(00|01|(1+0))/$1 /g;
$field =~ s/00 /\\ /g;
$field =~ s/01 /(a /g;
$field =~ s/(1+)0 /length($1)." "/ge;

1 while $field =~ s/(\(a)(([\s\\]*?(?:\d+|(?&RecursParens))){2})(?(DEFINE)(?<RecursParens>(?>\((?>(?>[^()]+)|(?:(?=.)(?&RecursParens)|))+\))))/\($2\)/g;

$field =~ s/\( /\(/g;

say $field;

这会给你一个这样的输出

\ (\ (1 (1 ((\ (1 1) \ \ \ ((1 \ \ 1) (\ (((4 4) 1) (\ (1 1) \ (2 (1 1)))) \ \ \ \ ((1 3) (2 (6 4)))))) \ \ \ (4 (1 3))))) \ \ ((1 \ \ 2) 2))

可以格式化为这样

 \ 
 (                             # (1 start)
      \ 
      (                             # (2 start)
           1 
           (                             # (3 start)
                1 
                (                             # (4 start)
                     (                             # (5 start)
                          \ 
                          ( 1 1 )                       # (6)
                          \ \ \ 
                          (                             # (7 start)
                               ( 1 \ \ 1 )                   # (8)
                               (                             # (9 start)
                                    \ 
                                    (                             # (10 start)
                                         (                             # (11 start)
                                              ( 4 4 )                       # (12)
                                              1
                                         )                             # (11 end)
                                         (                             # (13 start)
                                              \ 
                                              ( 1 1 )                       # (14)
                                              \ 
                                              (                             # (15 start)
                                                   2 
                                                   ( 1 1 )                       # (16)
                                              )                             # (15 end)
                                         )                             # (13 end)
                                    )                             # (10 end)
                                    \ \ \ \ 
                                    (                             # (17 start)
                                         ( 1 3 )                       # (18)
                                         (                             # (19 start)
                                              2 
                                              ( 6 4 )                       # (20)
                                         )                             # (19 end)
                                    )                             # (17 end)
                               )                             # (9 end)
                          )                             # (7 end)
                     )                             # (5 end)
                     \ \ \ 
                     (                             # (21 start)
                          4 
                          ( 1 3 )                       # (22)
                     )                             # (21 end)
                )                             # (4 end)
           )                             # (3 end)
      )                             # (2 end)
      \ \ 
      (                             # (23 start)
           ( 1 \ \ 2 )                   # (24)
           2
      )                             # (23 end)
 )                             # (1 end)