Question

我有一个moinmoin文本格式的文件：

* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)

＆＃39; [[＆＃39;和＆＃39;]]＆＃39;是条目的简短描述。我需要提取整个条目，但不是每个单词。

我在这里找到了类似问题的答案：https://stackoverflow.com/a/2700749/819596 但无法理解答案："my @array = $str =~ /( \{ (?: [^{}]* | (?0) )* \} )/xg;"

任何可行的方法都会被接受，但解释会有很大帮助，即：(?0)或/xg的作用。

Answer 1

代码可能如下所示：

use warnings; 
use strict;

my @subjects; # declaring a lexical variable to store all the subjects
my $pattern = qr/ 
  \[ \[    # matching two `[` signs
  \s*      # ... and, if any, whitespace after them
  ([^]]+) # starting from the first non-whitespace symbol, capture all the non-']' symbols
  ]]
/x;

# main processing loop:
while (<DATA>) { # reading the source file line by line
  if (/$pattern/) {      # if line is matched by our pattern
    push @subjects, $1;  # ... push the captured group of symbols into our array
  }
}
print $_, "\n" for @subjects; # print our array of subject line by line

__DATA__
* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)

如我所见，您需要的内容可以描述如下：在每行文件中尝试找到这个符号序列......

[[, an opening delimiter, 
then 0 or more whitespace symbols,
then all the symbols that make a subject (which should be saved),
then ]], a closing delimiter

如你所见，这种描述很自然地转化为正则表达式。唯一可能不需要的是/x正则表达式修饰符，它允许我广泛地评论它。）

Answer 2

如果文字永远不会包含]，您只需按照以前的建议使用以下内容：

/\[\[ ( [^\]]* ) \]\]/x

以下内容允许在所包含的文本中使用]，但我建议不要将其合并为更大的模式：

/\[\[ ( .*? ) \]\]/x

以下内容允许在包含的文本中使用]，并且是最强大的解决方案：

/\[\[ ( (?:(?!\]\]).)* ) \]\]/x

例如，

if (my ($match) = $line =~ /\[\[ ( (?:(?!\]\]).)* ) \]\]/x) {
   print "$match\n";
}

或

my @matches = $file =~ /\[\[ ( (?:(?!\]\]).)* ) \]\]/xg;

/x：忽略模式中的空格。允许添加空格以使模式可读而不改变模式的含义。记录在perlre。
/g：查找所有匹配项。记录在perlop。
(?0)用于使模式递归，因为链接节点必须处理任意的curlies嵌套。 * /g：查找所有匹配项。记录在perlre。

Answer 3

\[\[(.*)]]

\[是一个文字[， ]是文字]， .*表示每个0或更多字符的序列，括号中的内容是一个捕获组，因此您可以稍后在脚本中使用$ 1（或$ 2 .. $ 9，具体取决于您拥有的组数）来访问它。

将所有内容放在一起，您将匹配两个[，然后匹配最后一次连续两次]

更新在第二次阅读你的问题时，我突然感到困惑，你是否需要[[和]]或整行之间的内容 - 在这种情况下，将括号完全排除，只测试模式是否匹配，无需捕获。

Answer 4

你找到的答案是递归模式匹配，我认为你不需要。

/ x允许在regexp中使用无意义的空格和注释。
/ g通过所有字符串运行正则表达式。没有它只运行到第一场比赛。
/ xg是/ x和/ g合并。
（？0）再次运行regexp本身（递归）

如果我理解，你需要这样的东西：

$text="* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)
";

@array=($text=~/\[\[([^\]]*)\]\]/g);
print join(",",@array);

# this prints "  Virtualbox Guest Additions,  Abiword Wordprocessor,  Sylpheed E-Mail,   Kupfer"

Answer 5

我建议使用模块Text :: Balanced中的“extract_bracketed”或“extract_delimited” - 请参阅此处：http://perldoc.perl.org/Text/Balanced.html

Answer 6

perl -pe 's/.*\[\[(.*)\]\].*/\1/g' temp

测试如下：

> cat temp
        * [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
        * [[  Abiword Wordprocessor]] (2010/10/27 20:17)
        * [[  Sylpheed E-Mail]] (2010/03/30 21:49)
        * [[   Kupfer]] (2010/05/16 20:18)
>
> perl -pe 's/.*\[\[(.*)\]\].*/\1/g' temp
  Virtualbox Guest Additions
  Abiword Wordprocessor
  Sylpheed E-Mail
   Kupfer
>

S / [[（。）]]。* / \ 1 /克
。* [[ - ＆gt;匹配任何字符直到[[
（。*）]]将字符串“[[”until“]]”后面的任何字符存储在\ 1
。* - ＆gt;匹配其余部分。

然后因为我们在\ 1中有我们的数据，我们可以简单地用它在控制台上打印。

Answer 7

my @array = $str =~ /( \{ (?: [^{}]* | (?0) )* \} )/xg;

'x'标志表示在正则表达式中忽略空格，以允许更可读的表达式。 'g'标志表示结果将是从左到右的所有匹配的列表（匹配* g * lobally）。

(?0)表示第一组括号内的正则表达式。它是一个递归正则表达式，相当于一组规则，如：

E := '{' ( NoBrace | E) '}'
NoBrace := [^{}]*

Perl：如何在括号之间提取字符串

7 个答案: