我有这样的html页面:
<a name="robots"></a>
<div class="dnb">
<div class="info_outer">
<div class="info">
<div class="name"><a href="/p/1/">TEXT1</a> <span class="t">TEXT2</span></div>
<div class="role">SOMEROLE1</div>
</div>
</div>
</div>
<a name="humans"></a>
<div class="dnb">
<div class="info_outer">
<div class="info">
<div class="name"><a href="/p/1/">TEXT3</a> <span class="t">TEXT4</span></div>
<div class="role">SOMEROLE2</div>
</div>
</div>
</div>
<div class="dnb">
<div class="info_outer">
<div class="info">
<div class="name"><a href="/p/1/">TEXT5</a> <span class="t">TEXT6</span></div>
<div class="role">SOMEROLE3</div>
</div>
</div>
</div>
<div class="dnb">
<div class="info_outer">
<div class="info">
<div class="name"><a href="/p/1/">TEXT7</a> <span class="t">TEXT8</span></div>
<div class="role">SOMEROLE4</div>
</div>
</div>
</div>
我需要从这些div(名称和角色)获取信息。但是只能从那些属于“人类”分隔符的对象中分离出来。 JSoup有可能吗?
答案 0 :(得分:1)
是的,有可能。学习selector syntax。
import java.io.File;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class JSoupSelectors {
public static void main(String[] args) throws IOException {
File input = new File("WeAreTheRobots.xml");
Document doc = Jsoup.parse(input, null);
for (Element human : doc.select("a[name=humans]")) {
Element info = human.nextElementSibling().selectFirst("div.dnb>div.info_outer>div.info");
String name = info.selectFirst(">div.name>span.t").ownText();
System.out.println("Name = " + name);
String role = info.selectFirst(">div.role").ownText();
System.out.println("Role = " + role);
}
}
}
输出:
Name = TEXT4
Role = SOMEROLE2