Question

我正在使用Jsoup清理HTML代码。测试代码片段如下所示：

[
    {
        "id": 1,
        "nombre": "The design of every day things",
        "autor": "Don Norman",
        "disponibilidad": true,
        "popularidad": 70,
        "imagen": "https://images-na.ssl-images-amazon.com/images/I/410RTQezHYL._SX326_BO1,204,203,200_.jpg"
    },
    {
        "id": 2,
        "nombre": "100 años de soledad",
        "autor": "Garcia Marquez",
        "disponibilidad": false,
        "popularidad": 43,
        "imagen": "https://images-na.ssl-images-amazon.com/images/I/51egIZUl88L._SX336_BO1,204,203,200_.jpg"
    },
    {
        "id": 3,
        "nombre": "El nombre del viento",
        "autor": "Patrik Rufus",
        "disponibilidad": false,
        "popularidad": 80,
        "imagen": "https://static.megustaleer.com/images/libros_200_x/EL352799.jpg"
    }
]

并产生输出：

String text = "<img src=my_img.png width=550 height=34 alt= style=-aw-left-pos:110pt; -aw-rel-hpos:page; -aw-rel-vpos:page; -aw-top-pos:100pt; -aw-wrap-type:none; margin-left:0pt; margin-top:0pt; position:absolute; z-index:0 />";
String str_1 = org.jsoup.Jsoup.clean(text, "", org.jsoup.safety.Whitelist.relaxed(), 
   new org.jsoup.nodes.Document.OutputSettings().prettyPrint(false));
System.out.println(str_1);

与我期望的相去甚远。
不仅缺少很多参数，而且在关闭清洗后的标签也没有关闭-最后没有<img width="550" height="34" alt="style=-aw-left-pos:110pt;">。
当我以后想要使用OpenHtmlToPdf从此HTML制作PDF并收到有关不正确的HTML的错误时，它将产生问题：

/>

如何清除此类HTML代码，但收到正确关闭的结果标签？

Jsoup clean-未关闭img标签

0 个答案: