我想用jsoup从网页中提取内容。值在内部标记中如何提取这些值?
例如
< div id="tfm_skyscraper" class="top_right_skyscraper"></div>
<nav class="main group">
<section class="verticals world group" data-beacon="{"p"">
<ul class="verticals-ul">
<li class="front-page toplevel" data-beacon="{"">
<a class="toplevel-a" href="http://www.huffingtonpost.com" title="Home" tabindex="1" sl-processed="1">FRONT PAGE</a>*
</li>
</ul>
</section>
</nav>
我想在锚标记* 中提取内容 FRONT PAGE 如何做到这一点?
答案 0 :(得分:0)
这将使用类toplevel-a
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
public static void main(String[] args) throws Exception {
String html = "<div id=\"tfm_skyscraper\" class=\"top_right_skyscraper\"></div>" +
"<nav class=\"main group\">" +
"<section class=\"verticals world group\" data-beacon=\"{"p"\">" +
"<ul class=\"verticals-ul\">" +
"<li class=\"front-page toplevel\" data-beacon=\"{"\">" +
"<a class=\"toplevel-a\" href=\"http://www.huffingtonpost.com\" title=\"Home\" tabindex=\"1\" sl-processed=\"1\">FRONT PAGE</a>*" +
"</li>" +
"</ul>" +
"</section>" +
"</nav>";
Document doc = Jsoup.parse(html);
Elements els = doc.select("a.toplevel-a");
for(Element el : els) {
System.out.println(el.text());
}
}
}