Jsoup - seprate all url while download text of a page

时间:2015-04-23 05:13:55

标签: java jsoup

How can i use in jsoup to remove all the link while downloading a webpage.

I use the following code which give me text of a webpage

public static void Url(String urlTosearch) throws IOException {
        URL = urlTosearch;
        Document doc = Jsoup.connect(URL).get();
         String textOnly = Jsoup.parse(doc.toString()).text();
        Output ob = new Output();
        ob.Write(textOnly);

    }

but is there any way through which i can separate all link while downloading text of a page

2 个答案:

答案 0 :(得分:1)

我会做那样的事情:

public static void Url (String urlTosearch) throws IOException {
    URL = urlTosearch;
    Document doc = Jsoup.connect(URL).get();

    // Take all links in the page
    Elements links = doc.select("a[href]");
    for (Element link : links) { // Iter on each links to get URL
        String relHref = link.attr("href"); // Get relative URL
        String absHref = link.attr("abs:href"); // Get absolute URL
        // I let you do whatever you want with urls
    }

}

答案 1 :(得分:0)

  

如何在jsoup中使用以在下载网页时删除所有链接

您可以选择a属性的所有href元素,并remove来自Document对象的Document doc = Jsoup.connect(URL).get(); doc.select("a[href]").remove();//remove all found `<a href...>` elements from DOM String textOnly = doc.text();//generate text from DOM without your links 元素,代表您网页的DOM结构。

所以你的代码看起来像

&#58;