How can i use in jsoup to remove all the link while downloading a webpage.
I use the following code which give me text of a webpage
public static void Url(String urlTosearch) throws IOException {
URL = urlTosearch;
Document doc = Jsoup.connect(URL).get();
String textOnly = Jsoup.parse(doc.toString()).text();
Output ob = new Output();
ob.Write(textOnly);
}
but is there any way through which i can separate all link while downloading text of a page
答案 0 :(得分:1)
我会做那样的事情:
public static void Url (String urlTosearch) throws IOException {
URL = urlTosearch;
Document doc = Jsoup.connect(URL).get();
// Take all links in the page
Elements links = doc.select("a[href]");
for (Element link : links) { // Iter on each links to get URL
String relHref = link.attr("href"); // Get relative URL
String absHref = link.attr("abs:href"); // Get absolute URL
// I let you do whatever you want with urls
}
}
答案 1 :(得分:0)
如何在jsoup中使用以在下载网页时删除所有链接
您可以选择a
属性的所有href
元素,并remove
来自Document
对象的Document doc = Jsoup.connect(URL).get();
doc.select("a[href]").remove();//remove all found `<a href...>` elements from DOM
String textOnly = doc.text();//generate text from DOM without your links
元素,代表您网页的DOM结构。
所以你的代码看起来像
: