Question

我有这样的正则表达式：

 s/<(?:[^>'"]|(['"]).?\1)*>//gs

我不知道究竟是什么意思。

Answer 1

正则表达式看起来旨在从输入中删除HTML标记。

它匹配以<开头并以>结尾的文字，其中包含非> /非引号或带引号的字符串（可能包含>）。但似乎有一个错误：

.?表示引号可能包含0或1个字符;它可能是.*?（0个或更多字符）。并且为了防止回溯在某些奇怪的情况下将.匹配为引用，它需要将(?: ... )分组更改为占有（>而不是: ）。

Answer 2

此工具可以解释详细信息：http://rick.measham.id.au/paste/explain.pl?regex=%3C%28%3F%3A[^%3E%27%22]|%28[%27%22]%29.%3F\1%29*%3E

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [^>'"]                   any character except: '>', ''', '"'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      ['"]                     any character of: ''', '"'
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
    .?                       any character except \n (optional
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  >                        '>'

所以它试图删除HTML标签，因为也提到了。

Perl正则表达式解释

2 个答案: