我正在尝试使用Java lib Jsoup清理包含可能的恶意内容(XSS)的文本字符串。我必须允许< a href =“http://www.link.com”> link< / a>链接,但我不想因XSS原因允许使用javascript链接。
以下测试用例将失败,因为仍然允许使用javascript协议。关于如何使用Jsoup内置函数解决这个问题的任何想法?
@Test
public void test() {
Whitelist tWhitelist = Whitelist.none();
tWhitelist.addAttributes("a", "href");
tWhitelist.removeProtocols("a", "href", "javascript");
String tUnsafe = "<a href=\"javascript:alert(1)\">Link</a> is a link.";
assertEquals("Link is a link.", Jsoup.clean(tUnsafe, tWhitelist));
}
org.junit.ComparisonFailure: expected:<[Link] is a link.> but was:<[<a href="javascript:alert(1)">Link</a>] is a link.>
答案 0 :(得分:1)
这是因为您在白名单中添加了a
标记,您可以直接使用none
白名单,例如:
Whitelist tWhitelist = Whitelist.none();
String tUnsafe = "<a href=\"javascript:alert(1)\">Link</a> is a link.";
assertEquals("Link is a link.", Jsoup.clean(tUnsafe, tWhitelist));
或者您可以使用basic
白名单来保留其他href,例如:
Whitelist tWhitelist = Whitelist.basic();
tWhitelist.removeProtocols("a", "href", "javascript");
String tUnsafe = "<a href=\"javascript:alert(1)\">Link</a> is a link.<a href=\"http://www.google.com\" rel=\"nofollow\">google</a>";
assertEquals("<a rel=\"nofollow\">Link is a link.</a><a href=\"http://www.google.com\" rel=\"nofollow\">google</a>", Jsoup.clean(tUnsafe, tWhitelist));
答案 1 :(得分:0)
发现自己...这将使指定的协议有效,但要删除的javascript协议
Whitelist whitelist = Whitelist.none();
whitelist
.addTags("a")
.addAttributes("a", "href")
.addProtocols("a", "href", "http", "https", "mailto");
String safeText = Jsoup.clean(untrustedText, whitelist);