Jsoup.clean,图像src数据:图像/ png丢失

时间:2016-02-15 15:21:40

标签: image jsoup src

字符串“unsafe”来自contenteditable =“true”div到它从剪贴板粘贴为图像的位置

// neeeds to be escaped. It is HTML5 valid
String unsafe = ""<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAIAAACQkWg2AAABaklEQVQokZWSXYuCQBSG+xdCa5nRRZgzUlmDNlmCxSCjZg4Y9OkwMxf9Sn/aXrRERrC779U5L+fhfHBa9T/Vek3m8/kztm3bcRzXdTebjed50+m0AWy3W4RQXdcAgCcAIcyyTEpZlqVlWQ1ASnm9XpMksW3bsiwAAISQECKEuN/vlNL1et0ALpeLUkoIked5EAS+7xNCqqpSSlFKwzB838FxnCiKzuezEOJ4PJZlebvdOOdxHGOMPyytaRqEEGPMGOOcSymVUoyx5+gfrjQYDB5zn04nznlVVYyxyWRimuYHoN/vD4fD5XKZpmlRFGma7na7w+GQ5zlCaDQavQOu64ZhmCRJlmUYYwDAeDwOw7Aoiv1+v1gsHMdpAIQQSiml1Pd9wzC63a5hGL1eDyEURVEQBJ1OpwFQSuM49jxP13Vd1x+maZpfL2oAq9VqNpu1221N097O8lpdv/3Sj9X6YDaA1p/V6PBr6UPfrxpWT8DSD68AAAAASUVORK5CYII=" alt="">
"


org.jsoup.safety.Whitelist whitelist = Whitelist.relaxed();   

whitelist.addEnforcedAttribute("a", "rel", "nofollow"); 

String safe = Jsoup.clean(unsafe, whitelist);

//and safe becomes: "<img alt="">"
//entire src lost !?

注意:randome surrouning html没有效果。无论如何,Src都会丢失。

2 个答案:

答案 0 :(得分:4)

这里的基本问题是,如果在这里快速查看放松http://jsoup.org/apidocs/org/jsoup/safety/Whitelist.html#relaxed 假设只有标签在,没有属性。没有查看源代码,但这里声称一些属性也在:How to make a Jsoup whitelist to accept certain attribute content。 而且图像也已经在src中了。

导致我的src消失的问题是

preserveRelativeLinks

设置为false,放松,隐藏在JSoup代码中的某个地方 https://github.com/jhy/jsoup/issues/333

- &GT;应该设置为true:

System.out.println(Jsoup.clean("<img src='imgFile.png' />","http://www.somedomain.com", Whitelist.relaxed().preserveRelativeLinks(true)));

答案 1 :(得分:0)

这是如何允许基本文本包含内嵌图像(如src="data:image/png;base64,..."

String safe = Jsoup.clean(unsafe, Whitelist.basic()
.addTags("img")
.addAttributes("img", "height", "src", "width")
.addProtocols("img", "src", "http", "https", "data"));