Jsoup修改链接

时间:2014-02-03 14:14:09

标签: spring jsoup

我是春天和jsoup的新手...我正在使用jsoup来解析一个html文件并在div标签中复制一些文本并将其显示在我的页面上。现在我正在尝试修改链接并添加exit.do以将用户登出服务器。我尝试了很多不同的方法,我的链接不起作用:(之前有没有人处理过这个链接更新?任何帮助都是适用的。

这是我的代码。

非常感谢。

洛拉

modelMap = referenceData( request, modelMap);   
modelMap.put("externalUrl", externalUrlMap.get( request.getServletPath() ));
modelMap.put("elementId", elementIdMap.get( request.getServletPath() ));

/** Pass the url map to a string */
String url = (String) externalUrlMap.get( request.getServletPath() );

/** Pass the div map to a string */
String eleId = (String) elementIdMap.get( request.getServletPath() );

/** Retrieve and parse the document using Jsoup*/
//URL externalUrl = new URL(url);
//Document document = Jsoup.parse(externalUrl, 10000);
File internalFile = new File(url);
Document document = Jsoup.parse(internalFile, "UTF-8");

/** Clean the document to prevent XSS only include tags and style below */
//document = new Cleaner(Whitelist.basic().addTags("div", "em", "h1", "h2").addAttributes("div","class", "style")).clean(document); 

/** Select privactText tags from the id */
Element divContent = document.select(eleId).first(); 

/** Returned the text inside the div tag */     
String parsedExternalContent =  divContent.html();

/** Get all links inside div tag */
Elements links = divContent.select("a[href]");

String exitUrl = "/exit?logout=true&uri="; 

/** Loop through the links and if the links are relative path add the exit.do to the link */
for (Element link : links) {            
    if (!link.attr("href").toLowerCase().startsWith("http://"))    {

        String urltext = link.attr("href");
        String exitText = "/exit?logout=true&uri=";
        ...

    }
}               

modelMap.addAttribute("parsedExternalContent", parsedExternalContent);  

return new ModelAndView ("externalParserContent", modelMap);  

1 个答案:

答案 0 :(得分:0)

当我需要用“编码”网址重新编写原始字符串时,这就是我这样做的方式:

    Document doc = getHtmlDocumentFromString(htmlOnly);
    Elements links = doc.select("a[href]");
    /**
     * since we would want to track link index per click - iterate links in the old fashion way (Elements is a List<Element>)
     */
    for(int linkIndexTopToBottom = 0; linkIndexTopToBottom < links.size(); linkIndexTopToBottom++){
        try{
            Element link = links.get(linkIndexTopToBottom);
            if (!UriUtils.isValidUrl(link.attr("href")))
                continue;
...
            link.attr("href",<NEW URL>);
        }catch (MalformedURLException exception){
            log.debug("Provided URL was not valid: " + links.get(linkIndexTopToBottom).attr("abs:href") + ", skipping link re-write");
        }
    }
    return doc;

如您所见,您需要设置如下属性:

link.attr("href", <NEW URL>);

由于您的帖子中缺少该部分,我不确定您是否这样做

修改

追加将是完全相同的想法: link.attr("href", link.attr("href") + "<what you need to append with>");

底线是您需要将href属性设置为新值 Example from the jSoup cook book