我正在Mojo::URL的上下文中使用Perl(URI和HTML::Restrict)中的几个URL解析器。我要解决的问题是,在某些情况下,我希望能够删除URL。例如,在过滤HTML时,我可能希望允许相对URL,但不允许使用JavaScript。
我遇到了以下问题:
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw( say );
use Mojo::URL ();
my $js_url = 'javascript:alert(1);';
my $mojo = Mojo::URL->new($js_url);
say 'scheme: ' . $mojo->scheme . " in $js_url";
for my $i ( 1 .. 8, 14 .. 31 ) {
my $bad_url = "&#$i;" . $js_url;
my $mojo = Mojo::URL->new($bad_url);
say $mojo->scheme ? 'scheme is ' . $mojo->scheme : 'no scheme found in ' . $bad_url;
}
这将产生以下输出:
scheme: javascript in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
no scheme found in javascript:alert(1);
在上述未找到该方案的URL中,我假设它是相对URL。但是,如果我在href
标签中使用上述URL,则单击Chrome,Firefox和Safari时都会弹出JavaScript警报框:
<a href="javascript:alert(1);">1</a>
<a href="javascript:alert(1);">2</a>
<a href="javascript:alert(1);">3</a>
<a href="javascript:alert(1);">4</a>
<a href="javascript:alert(1);">5</a>
<a href="javascript:alert(1);">6</a>
<a href="javascript:alert(1);">7</a>
<a href="javascript:alert(1);">8</a>
<a href="javascript:alert(1);">14</a>
<a href="javascript:alert(1);">15</a>
<a href="javascript:alert(1);">16</a>
<a href="javascript:alert(1);">17</a>
<a href="javascript:alert(1);">18</a>
<a href="javascript:alert(1);">19</a>
<a href="javascript:alert(1);">20</a>
<a href="javascript:alert(1);">21</a>
<a href="javascript:alert(1);">22</a>
<a href="javascript:alert(1);">23</a>
<a href="javascript:alert(1);">24</a>
<a href="javascript:alert(1);">25</a>
<a href="javascript:alert(1);">26</a>
<a href="javascript:alert(1);">27</a>
<a href="javascript:alert(1);">28</a>
<a href="javascript:alert(1);">29</a>
<a href="javascript:alert(1);">30</a>
<a href="javascript:alert(1);">31</a>
我在示例中使用了Mojo::URL
,但是URI
的行为相同。我收集到的是,在两种情况下,解析器都不会剥离不可打印的控制字符,因此无法识别URL中包含JavaScript。 Web浏览器(是否有帮助?)认识到控制字符不可打印,因此允许单击URL时执行URL中的JavaScript。
这是怎么回事?解析器和浏览器的行为均正确吗?由我决定在解析URL之前删除无法打印的控制字符吗?