我正在使用Jsoup清理HTML代码。测试代码片段如下所示:
[
{
"id": 1,
"nombre": "The design of every day things",
"autor": "Don Norman",
"disponibilidad": true,
"popularidad": 70,
"imagen": "https://images-na.ssl-images-amazon.com/images/I/410RTQezHYL._SX326_BO1,204,203,200_.jpg"
},
{
"id": 2,
"nombre": "100 años de soledad",
"autor": "Garcia Marquez",
"disponibilidad": false,
"popularidad": 43,
"imagen": "https://images-na.ssl-images-amazon.com/images/I/51egIZUl88L._SX336_BO1,204,203,200_.jpg"
},
{
"id": 3,
"nombre": "El nombre del viento",
"autor": "Patrik Rufus",
"disponibilidad": false,
"popularidad": 80,
"imagen": "https://static.megustaleer.com/images/libros_200_x/EL352799.jpg"
}
]
并产生输出:
String text = "<img src=my_img.png width=550 height=34 alt= style=-aw-left-pos:110pt; -aw-rel-hpos:page; -aw-rel-vpos:page; -aw-top-pos:100pt; -aw-wrap-type:none; margin-left:0pt; margin-top:0pt; position:absolute; z-index:0 />";
String str_1 = org.jsoup.Jsoup.clean(text, "", org.jsoup.safety.Whitelist.relaxed(),
new org.jsoup.nodes.Document.OutputSettings().prettyPrint(false));
System.out.println(str_1);
与我期望的相去甚远。
不仅缺少很多参数,而且在关闭清洗后的标签也没有关闭-最后没有<img width="550" height="34" alt="style=-aw-left-pos:110pt;">
。
当我以后想要使用OpenHtmlToPdf从此HTML制作PDF并收到有关不正确的HTML的错误时,它将产生问题:
/>
如何清除此类HTML代码,但收到正确关闭的结果标签?