Perl正则表达式解释

时间:2014-01-07 22:12:23

标签: regex perl

我有这样的正则表达式:

 s/<(?:[^>'"]|(['"]).?\1)*>//gs

我不知道究竟是什么意思。

2 个答案:

答案 0 :(得分:1)

正则表达式看起来旨在从输入中删除HTML标记。

它匹配以<开头并以>结尾的文字,其中包含非> /非引号或带引号的字符串(可能包含>)。但似乎有一个错误:

.?表示引号可能包含0或1个字符;它可能是.*?(0个或更多字符)。并且为了防止回溯在某些奇怪的情况下将.匹配为引用,它需要将(?: ... )分组更改为占有(>而不是: )。

答案 1 :(得分:0)

此工具可以解释详细信息:http://rick.measham.id.au/paste/explain.pl?regex=%3C%28%3F%3A[^%3E%27%22]|%28[%27%22]%29.%3F\1%29*%3E

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [^>'"]                   any character except: '>', ''', '"'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      ['"]                     any character of: ''', '"'
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
    .?                       any character except \n (optional
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  >                        '>'

所以它试图删除HTML标签,因为也提到了。