如何转义<code> tag with php htmlentities, even if tag has attributes

时间:2016-08-30 04:13:58

标签: php regex escaping preg-replace xss

So someone actually posted a fantastic solution here How can I escape all code within <code></code> tags to allow people to post code?

The problem is that this only works if it's <code></code>. However, this breaks with <code id="lol"></code for example, since it contains an attribute. How can I account for this, in order to strictly escape strings inside the code tag, whether or not it has any attributes.

I apologize if there is an obvious solution to this. Regexes give me nightmares.

Edit

As I explained in the question initially, the post that is supposedly a duplicate does not account for the <code> tag with something like a class or any other attributes.

1 个答案:

答案 0 :(得分:1)

尽管我在上面发表评论,但我仍会努力为您提供正则表达式。但是,我强调 建议使用正则表达式,而是使用HTML解析器。

你的正则表达式应该看起来像这样:

<\s*code(.*?)>(.+?)<\s*\/code\s*>

稍微分解一下,

\s*匹配零个或多个空白字符。

code匹配文字字符串&#34; code&#34;。

.*?是零个或多个字符的 lazy 匹配。它将匹配所有内容(如果有的话)直到​​标记的末尾。

(.+?)捕获组,包含一个或多个字符的延迟匹配。如果不出意外,您的<code>标签将永远不会完全空白(因为它们之间必须至少有一个字符)。

最后,<\s*\/code\s*>匹配结束标记,可能包含空格。请注意,斜杠(/)字符是转义的,因为它必须在阳光下几乎所有的正则表达式中。