In order to make my inputs safe, I'm using htmlentities in php:
$input = $_POST['field'];
$result = htmlspecialchars($input);
This works, but then I realized that in some inputs, I need to allow some basic markup like <b>
and <i>
, copyright logos and basic stuff for the user. So I started doing this:
$result = $_POST['ftext'];
$presanitize = htmlspecialchars($result);
$newftext = str_replace(array("<i>", "<b>", "</i>", "</b>", "©", """, "<a>", "</a>"),
array("<i>", "<b>", "</i>", "</b>", "©", '"', "<a>", "</a>"), $presanitize);
Now we come to my main problem: how to allow things like <a>
and <img>
where we don't have only a tag and don't know what comes inside of it?
I can replace , because it's always only , but if I replace , it wont work as I'll have lots of stuff (<a href="http://link.com">Text</a>
) inside of it.
What should I do? Thanks in advance.
答案 0 :(得分:4)
简单的答案是:你没有。这就是为什么许多流行的论坛系统使用某种标记而不仅仅是纯HTML的部分原因。否则,人们可以而且会以某种方式做出令人讨厌的事情。
<img src="http://example.com/random-pic.jpg" onload="location.href='http://some.nasty.page/exploit';"/>
但是你可以删除事件标签吗?当然,但是你会跟上浏览器支持的所有内容及其怪癖吗?你真的能超越每个人吗?
如果您仍想这样做,请查找提供此功能的经过充分记录,测试和使用的库或脚本。 PHP essentially has this built in,但它确实是准系统。要查找的某些关键字为"php html sanitizer"或类似。
就我个人而言,我建议您只支持Markdown或BBCode语法(再次:有许多可以使用的代码段和库)。除非你真的需要,否则不要重新发明轮子。
答案 1 :(得分:1)
对<a>
和<img>
代码使用preg_replace():
$new = preg_replace('/<(img|a)(.*?)>/i', '<$1$2>', $input);
请注意,这完全未经测试,但应该提供有关如何解决问题的提示。