Jsoup.clean,图像src数据:图像/ png丢失

时间:2016-02-15 15:21:40

标签: image jsoup src

字符串“unsafe”来自contenteditable =“true”div到它从剪贴板粘贴为图像的位置

// neeeds to be escaped. It is HTML5 valid
String unsafe = ""<img src="" alt="">
"


org.jsoup.safety.Whitelist whitelist = Whitelist.relaxed();   

whitelist.addEnforcedAttribute("a", "rel", "nofollow"); 

String safe = Jsoup.clean(unsafe, whitelist);

//and safe becomes: "<img alt="">"
//entire src lost !?

注意:randome surrouning html没有效果。无论如何,Src都会丢失。

2 个答案:

答案 0 :(得分:4)

这里的基本问题是,如果在这里快速查看放松http://jsoup.org/apidocs/org/jsoup/safety/Whitelist.html#relaxed 假设只有标签在,没有属性。没有查看源代码,但这里声称一些属性也在:How to make a Jsoup whitelist to accept certain attribute content。 而且图像也已经在src中了。

导致我的src消失的问题是

preserveRelativeLinks

设置为false,放松,隐藏在JSoup代码中的某个地方 https://github.com/jhy/jsoup/issues/333

- &GT;应该设置为true:

System.out.println(Jsoup.clean("<img src='imgFile.png' />","http://www.somedomain.com", Whitelist.relaxed().preserveRelativeLinks(true)));

答案 1 :(得分:0)

这是如何允许基本文本包含内嵌图像(如src="data:image/png;base64,..."

String safe = Jsoup.clean(unsafe, Whitelist.basic()
.addTags("img")
.addAttributes("img", "height", "src", "width")
.addProtocols("img", "src", "http", "https", "data"));