如何保护以下基于DOM的XSS攻击呢?
具体来说,是否有一个protect()函数可以使下面的安全? 如果没有,那么还有另一种解决方案吗? 例如:给div一个id然后再为该元素分配一个onclick处理程序
<?php
function protect()
{
// For non-DOM XSS attacks, hex-encoding all non-alphanumeric characters
// with ASCII values less than 256 works (ie: \xHH)
// But is it possible to augment this function to protect against
// the below DOM based XSS attack?
}
?>
<body>
<div id="mydiv"></div>
<script type="text/javascript">
var xss = "<?php echo protect($_GET["xss"]) ?>";
$("#mydiv").html("<div onclick='myfunc(\""+xss+"\")'></div>")
</script>
</body>
我希望得到一个不是“避免使用innerHTML”或“正则表达xss变量到[a-zA-Z0-9]”的答案......即:是否有更通用的解决方案?
由于
答案 0 :(得分:2)
扩展Vineet的回复,这里有一组测试用例:
答案 1 :(得分:1)
我一直在玩PHP的DOMDocument和相关类,以便编写一个可以处理这类内容的HTML解析器。目前处于发展的早期阶段并且还没有准备好实际使用,但我早期的实验似乎显示了这个想法的一些希望。
基本上,您将Markup加载到DOMDocument中,然后遍历树。对于树中的每个节点,您将根据允许的节点类型列表检查节点类型。如果节点类型不在列表中,则将其从树中删除。
您可以使用与此类似的方法在一个标记中找到所有SCRIPT标记并将其删除。如果你可以从你提供的标记中提取任何嵌入的脚本,那么基于DOM的XSS将变得无用。
这是我正在使用的代码,以及处理StackOverflow主页的测试用例。就像我说的那样,它远不是生产质量代码,只不过是一个概念证明。不过,我希望你发现它很有用。
<?php
class HtmlClean
{
private $whiteList = array (
'#cdata-section', '#comment', '#text', 'a', 'abbr', 'acronym', 'address', 'b',
'big', 'blockquote', 'body', 'br', 'caption', 'cite', 'code', 'col', 'colgroup',
'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em', 'fieldset', 'h1', 'h2', 'h3', 'h4',
'h5', 'h6', 'head', 'hr', 'html', 'i', 'img', 'ins', 'kbd', 'li', 'link', 'meta',
'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strike', 'strong', 'style', 'sub',
'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'title', 'tr', 'tt', 'ul',
'var'
);
private $attrWhiteList = array (
'class', 'id', 'title'
);
private $dom = NULL;
/**
* Get current tag whitelist
* @return array
*/
public function getWhiteListTags ()
{
$this -> whiteList = array_values ($this -> whiteList);
return ($this -> whiteList);
}
/**
* Add tag to the whitelist
* @param string $tagName
*/
public function addWhiteListTag ($tagName)
{
$tagName = strtolower (trin ($tagName));
if (!in_array ($tagName, $this -> whiteList))
{
$this -> whiteList [] = $tagName;
}
}
/**
* Remove a tag from the whitelist
* @param string $tagName
*/
public function removeWhiteListTag ($tagName)
{
if ($index = array_search ($tagName, $this -> whiteList))
{
unset ($this -> whiteList [$index]);
}
}
/**
* Load document markup into the class for cleaning
* @param string $html The markup to clean
* @return bool
*/
public function loadHTML ($html)
{
if (!$this -> dom)
{
$this -> dom = new DOMDocument();
}
$this -> dom -> preserveWhiteSpace = false;
$this -> dom -> formatOutput = true;
return $this -> dom -> loadHTML ($html);
}
public function outputHtml ()
{
$ret = '';
if ($this -> dom)
{
$ret = $this -> dom -> saveXML ();
}
return ($ret);
}
private function cleanAttrs (DOMnode $elem)
{
$attrs = $elem -> attributes;
$index = $attrs -> length;
while (--$index >= 0)
{
$attrName = strtolower ($attrs -> item ($indes) -> name);
if (!in_array ($attrName, $this -> attrWhiteList))
{
$elem -> removeAttribute ($attrName);
}
}
}
/**
* Recursivly remove elements from the DOM that aren't whitelisted
* @param DOMNode $elem
* @return array List of elements removed from the DOM
* @throws Exception If removal of a node failed than an exception is thrown
*/
private function cleanNodes (DOMNode $elem)
{
$removed = array ();
if (in_array (strtolower ($elem -> nodeName), $this -> whiteList))
{
// Remove non-whitelisted attributes
if ($elem -> hasAttributes ())
{
$this -> cleanAttrs ($elem);
}
/*
* Iterate over the element's children. The reason we go backwards is because
* going forwards will cause indexes to change when elements get removed
*/
if ($elem -> hasChildNodes ())
{
$children = $elem -> childNodes;
$index = $children -> length;
while (--$index >= 0)
{
$removed = array_merge ($removed, $this -> cleanNodes ($children -> item ($index)));
}
}
}
else
{
// The element is not on the whitelist, so remove it
if ($elem -> parentNode -> removeChild ($elem))
{
$removed [] = $elem;
}
else
{
throw new Exception ('Failed to remove node from DOM');
}
}
return ($removed);
}
/**
* Perform the cleaning of the document
*/
public function clean ()
{
$removed = $this -> cleanNodes ($this -> dom -> getElementsByTagName ('html') -> item (0));
return ($removed);
}
}
$test = file_get_contents( ('http://www.stackoverflow.com/'));
// Windows-stype linebreaks really foul up the works. There's probably a better fix for this
$test = str_replace (chr (13), '', $test);
$cleaner = new HtmlClean ();
$cleaner -> loadHTML ($test);
echo ('<h1>Before</h1><pre>' . htmlspecialchars ($cleaner -> outputHtml ()) . '</pre>');
$start = microtime (true);
$removed = $cleaner -> clean ();
$cleanTime = microtime (true) - $start;
echo ('<h1>Removed tag list</h1>');
foreach ($removed as $elem)
{
var_dump ($elem -> nodeName);
}
echo ('<h1>After</h1><pre>' . htmlspecialchars ($cleaner -> outputHtml ()) . '</pre>');
// benchmark
var_dump ($cleanTime);
?>
答案 2 :(得分:0)
我不是PHP专家,但是如果您想要阻止针对所呈现的代码示例的XSS攻击,使用当前格式,只需最少的更改,您就可以使用PHP edition of OWASP ESAPI。具体而言,使用JavaScript codec class from ESAPI保护xss
变量的内容,因为它出现在JavaScript上下文中。