for (int x = 0; x < 8000; x += 50) {
Document doc = Jsoup.connect("localhost.com/"+x).get();
Elements links = doc.select("a[href]");
for (Element link: links) {
String text = link.text();
System.out.println(text);
}
}
}
}
这将产生如下输出:
Adrian Riven
HalfSugar No Ice
Yassuo
Amandadog
P1 Sloosh
无论如何要删除空行?所以它看起来像输出:
Adrian Riven
HalfSugar No Ice
Yassuo
Amandadog
P1 Sloosh
我试过了
text.replace(“\ n”,“”);
text.replaceAll(“\ r?\ n”,“”)
像这样编辑,这对我不起作用 没试过另一个
Elements links = doc.select("a[href]");
for (Element link: links) {
Document docs = Jsoup.parse(String.valueOf(links));
docs.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
String text = link.text()+link.text();
System.out.println(text.replace("Show More", ""));
示例html:
</td>
<td class="SummonerName Cell">
<a href="/summoner/userName=Cris" class="Link">Cris</a>
</td>
<td class="TierRank Cell">Challenger</td>
<td class="LP Cell">1,137 LP</td>
<td class="TeamName Cell">
Apex Gaming
</td>
<td class="RatioGraph Cell">
<div class="WinRatioGraph">
<div class="Graph">
答案 0 :(得分:0)
删除可能很棘手,因为有些html标记总是空的,如<br/> </ img>
等,
如果您可以决定愿意删除哪些元素,请尝试以下
// Names of the elements to remove if empty
Set<String> ElementsRemove = ....
// Parse the html into a jsoup document
Document source = Jsoup.parse(myHtml);
// Clean the html according to a whitelist
Document cleaned = new Cleaner(whitelist).clean(source);
// For each element in the cleaned document
for(Element el: cleaned.getAllElements()) {
if(el.children().isEmpty() && !el.hasText()) {
// Element is empty, check if should be removed
if(removable.contains(el.tagName())) el.remove();
}
}
或更改 OutputSettings
final String html = ...;
OutputSettings settings = new OutputSettings();
settings.escapeMode(Entities.EscapeMode.xhtml);
String cleanHtml = Jsoup.clean(html, "", Whitelist.relaxed(), settings);
这也可以通过Jsoup解析的文档来实现:
Document doc = Jsoup.parse(...);
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
// ...
答案 1 :(得分:0)
这个技巧对我有用:
Document doc = Jsoup.connect("localhost.com").get();
Elements links = doc.select("a[href]");
for (Element link : links) {
if (!link.text().isEmpty())
System.out.println(link.text());
}