我不确定我做错了什么,但我正在尝试基本清理html文件但由于某种原因我的相对链接更改为rel =“nofollow”当我使用basic时我使用了relax并设置了preserveRelativeLinks为true,这将删除href ....
这是我的代码:
File internalFile = new File(url);
Document document = Jsoup.parse(internalFile, "UTF-8");
/** Get safe HTML from HTML, by parsing HTML and filtering it through a white-list of permitted tags and attributes. */
document = new Cleaner(Whitelist.relaxed()
.addTags("div", "em", "h1", "h2", "a")
.addAttributes("div","class", "style", "name", "a", "href")
.addProtocols("a", "href", "http", "https", "mailto")
.preserveRelativeLinks(true)
//resolvesRelativeLinks
)
.clean(document);
/** Select privactText tags from the id */
Element divContent = document.select(eleId).first();
/** Get all links inside div tag */
Elements links = divContent.select("a[href]");
String exitUrl = "/exit?logout=true&uri=";
/** Loop through links and add the exit.do to the link */
for (Element link : links) {
link.attr("href", exitUrl + link.attr("href"));
}
/** Returned the text inside the div tag */
String parsedExternalContent = divContent.html();
非常感谢。