Yahoo HTML5 Context Parser - 跨站点脚本(XSS)

时间:2015-09-25 06:36:21

标签: html5 security html-parsing xss

我正在尝试雅虎的HTML5上下文解析器Yahoo context-parser,这有助于识别潜在的XSS漏洞。

试一试,针对文字运行 ./ bin / context-dump 工具

<form><input name=q value="%(query)s"> </form> 

(取自 ArticleXSSInAttributes

产生:

HTML-State { statesSize: 51 } +0ms
HTML-State { ch: 0, state: 1, symbol: 0 } +2ms
HTML-State { ch: f [0x66], state: 8, symbol: 11 } +1ms
HTML-State { ch: o [0x6f], state: 10, symbol: 11 } +0ms
HTML-State { ch: r [0x72], state: 10, symbol: 11 } +0ms
HTML-State { ch: m [0x6d], state: 10, symbol: 11 } +0ms
HTML-State { ch: > [0x3e], state: 10, symbol: 9 } +0ms
HTML-State { ch: [0x20], state: 1, symbol: 0 } +0ms
HTML-State { ch: [0x20], state: 1, symbol: 0 } +0ms
HTML-State { ch: [0x20], state: 1, symbol: 0 } +0ms
HTML-State { ch: < [0x3c], state: 1, symbol: 7 } +0ms
HTML-State { ch: i [0x69], state: 8, symbol: 11 } +0ms
HTML-State { ch: n [0x6e], state: 10, symbol: 11 } +0ms
HTML-State { ch: p [0x70], state: 10, symbol: 11 } +1ms
HTML-State { ch: u [0x75], state: 10, symbol: 11 } +0ms
HTML-State { ch: t [0x74], state: 10, symbol: 11 } +0ms
HTML-State { ch: [0x20], state: 10, symbol: 0 } +0ms
HTML-State { ch: n [0x6e], state: 34, symbol: 11 } +0ms
HTML-State { ch: a [0x61], state: 35, symbol: 11 } +0ms
HTML-State { ch: m [0x6d], state: 35, symbol: 11 } +0ms
HTML-State { ch: e [0x65], state: 35, symbol: 11 } +0ms
HTML-State { ch: = [0x3d], state: 35, symbol: 8 } +0ms
HTML-State { ch: q [0x71], state: 37, symbol: 11 } +0ms
HTML-State { ch: [0x20], state: 40, symbol: 0 } +0ms
HTML-State { ch: v [0x76], state: 34, symbol: 11 } +0ms
HTML-State { ch: a [0x61], state: 35, symbol: 11 } +0ms
HTML-State { ch: l [0x6c], state: 35, symbol: 11 } +0ms
HTML-State { ch: u [0x75], state: 35, symbol: 11 } +0ms
HTML-State { ch: e [0x65], state: 35, symbol: 11 } +0ms
HTML-State { ch: = [0x3d], state: 35, symbol: 8 } +0ms
HTML-State { ch: " [0x22], state: 37, symbol: 2 } +1ms
HTML-State { ch: % [0x25], state: 38, symbol: 12 } +0ms
HTML-State { ch: ( [0x28], state: 38, symbol: 12 } +1ms
HTML-State { ch: q [0x71], state: 38, symbol: 11 } +0ms
HTML-State { ch: u [0x75], state: 38, symbol: 11 } +0ms
HTML-State { ch: e [0x65], state: 38, symbol: 11 } +0ms
HTML-State { ch: r [0x72], state: 38, symbol: 11 } +0ms
HTML-State { ch: y [0x79], state: 38, symbol: 11 } +0ms
HTML-State { ch: ) [0x29], state: 38, symbol: 12 } +0ms
HTML-State { ch: s [0x73], state: 38, symbol: 11 } +0ms
HTML-State { ch: " [0x22], state: 38, symbol: 2 } +0ms
HTML-State { ch: > [0x3e], state: 42, symbol: 9 } +0ms
HTML-State { ch: [0x20], state: 1, symbol: 0 } +0ms
HTML-State { ch: < [0x3c], state: 1, symbol: 7 } +0ms
HTML-State { ch: / [0x2f], state: 8, symbol: 6 } +0ms
HTML-State { ch: f [0x66], state: 9, symbol: 11 } +0ms
HTML-State { ch: o [0x6f], state: 10, symbol: 11 } +0ms
HTML-State { ch: r [0x72], state: 10, symbol: 11 } +0ms
HTML-State { ch: m [0x6d], state: 10, symbol: 11 } +0ms
HTML-State { ch: > [0x3e], state: 10, symbol: 9 } +0ms
HTML-State { ch: [0xa], state: 1, symbol: 0 } +0ms
HTML-State { undefined - char in html without state } +0ms

给定输出如何帮助我识别潜在的XSS问题,或者换言之,Context Parser如何帮助?

1 个答案:

答案 0 :(得分:2)

它会告诉您HTML页面中每个字符的语法上下文。

可以在constants file中查找state。例如10表示它是正在解析的标记名称,在您的示例中,这是<input /><form />标记的名称。

了解输出内容的上下文会通知开发人员使用正确的编码。

例如,在将用户数据输出到HTML时,您将进行HTML编码。这是某些字符,例如低于标志的字符变为HTML编码(< = &lt;)。

在JavaScript上下文中,您使用十六进制实体编码,因此<会变为\x3c

在所有实际意义上,我不确定上下文解析器在日常使用中的用处。一旦你意识到它们,使用哪种编码类型应该是非常明显的。当你在HTML中有一个JavaScript上下文时,可能是自己学习这个问题的主要缺陷:

<a href="javascript:void();" onclick="//this is parsed by HTML parser and then the JavaScript parser" />

而在<script>块中,它只是JavaScript解析器:

<script>
  // The HTML parser don't run past here
</script>

但是,一旦你意识到这一点,上下文解析器的好处就是最小化。

因此即使它可以帮助处理服务器端上下文,它也不会帮助DOM操作和防止基于DOM的XSS:

<a href="javascript:void()" onclick="document.getElementById('foo').innerHTML = '(whatever is here should be HTML encoded, then hex entity encoded, then HTML encoded again)'" />

(请在Context Parser中尝试一下。) (最后一个HTML编码不应该产生任何更改,因为十六进制实体编码的字符\x和十六进制字符不需要进行HTML编码 - 但最终的上下文仍然是HTML。)