我有这样的正则表达式:
s/<(?:[^>'"]|(['"]).?\1)*>//gs
我不知道究竟是什么意思。
答案 0 :(得分:1)
正则表达式看起来旨在从输入中删除HTML标记。
它匹配以<
开头并以>
结尾的文字,其中包含非>
/非引号或带引号的字符串(可能包含>
)。但似乎有一个错误:
.?
表示引号可能包含0或1个字符;它可能是.*?
(0个或更多字符)。并且为了防止回溯在某些奇怪的情况下将.
匹配为引用,它需要将(?: ... )
分组更改为占有(>
而不是:
)。
答案 1 :(得分:0)
此工具可以解释详细信息:http://rick.measham.id.au/paste/explain.pl?regex=%3C%28%3F%3A[^%3E%27%22]|%28[%27%22]%29.%3F\1%29*%3E
NODE EXPLANATION
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^>'"] any character except: '>', ''', '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
['"] any character of: ''', '"'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
.? any character except \n (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
> '>'
所以它试图删除HTML标签,因为也提到了。