Question

有人可以解释一下这个正则表达式的含义吗？

$html = preg_replace("# <(?![/a-z]) | (?<=\s)>(?![a-z]) #exi", "htmlentities('$0')", $html);

有人在How to strip tags in a safer way than using strip_tags function?添加了它，但我无法理解。

这是我关于stackoverflow的第一篇文章，所以如果我犯了什么错误，请原谅我。

谢谢！

Answer 1

#...#      the # and # are just characters to start en end a REGEX
           (you can use a lot of character for this)
#exi       the e, x and i flags. See the PHP.net site for information
           about it

<          the < character
(?!...)    a negative lookahead. The REGEX matches when the characters
           after this are NOT equal to one of those
[/a-z]     a character class, matches for the / character and the
           letters a - z
|          OR
(?<=\s)    a positive lookbehind. The REGEX maches when there is
           \s (whitepspace) before
>          the > character
(?![a-z])  negative lookahead for the letters a - z

基本上，它匹配所有未用作标记的<和>个字符。例如，<foo和</foo将不匹配，foo>也不会匹配。但是1 < 3会匹配。这将传递给htmlentities函数并变为1 < 3。现在，您可以使用strip_tags保存仅删除标记。

Answer 2

在我看来，它试图根据＆lt;之后的后续字符来确定什么不是HTML标签。或者＆gt;是一个数字。

这意味着它将捕获<：

<span>This is <5 ml.</span>

并将其替换为与该字符等效的HTML实体，允许您安全地使用strip_tags而不破坏字符串的含义（如您引用的问题中所述）。

Answer 3

查找未跟<

的a-z

或

空格后跟>，后跟a-z

然后将其替换为htmlentities('$0')，其中$ 0是您的全部匹配！

i选项忽略大小写

e执行正常替换

x忽略文字空格

需要解释正则表达式

3 个答案: