Jsoup清洁白名单问题

时间:2014-02-04 17:24:55

标签: jsoup

我不确定我做错了什么,但我正在尝试基本清理html文件但由于某种原因我的相对链接更改为rel =“nofollow”当我使用basic时我使用了relax并设置了preserveRelativeLinks为true,这将删除href ....

这是我的代码:

File internalFile = new File(url);
    Document document = Jsoup.parse(internalFile, "UTF-8");             

    /** Get safe HTML from HTML, by parsing HTML and filtering it through a white-list of permitted tags and attributes. */
    document = new Cleaner(Whitelist.relaxed()
            .addTags("div", "em", "h1", "h2", "a")
            .addAttributes("div","class", "style", "name", "a", "href")
            .addProtocols("a", "href", "http", "https", "mailto")
            .preserveRelativeLinks(true)
            //resolvesRelativeLinks
            )
    .clean(document);   

    /** Select privactText tags from the id */
    Element divContent = document.select(eleId).first(); 

    /** Get all links inside div tag */
    Elements links = divContent.select("a[href]");      
    String exitUrl = "/exit?logout=true&uri="; 

    /** Loop through links and add the exit.do to the link */

    for (Element link : links) {            
        link.attr("href", exitUrl + link.attr("href"));         
    }   

    /** Returned the text inside the div tag */     
    String parsedExternalContent =  divContent.html();

非常感谢。

0 个答案:

没有答案