Question

我正在尝试使用Jsoup解析器从HTML页面解析作业信息。我正在尝试提取所有职位发布详细信息，但我无法正确查询。我尝试了Tryjsoup.com来了解查询结构，但我无法弄清楚如何获得这些元组，还请告知如何抓住它们的内部结构

Html代码：

 <div itemscope itemtype="http://schema.org/JobPosting" type="tuple" id="131015000050" class="row  ">
<a count=1 href="some link">
<span itemprop=title><font class=hlite>Developer</font></span>
<span itemprop=hiringOrganization>Vm World</span>
</a>
</div>
<div class= "other details"><span itemprop=baseSalary><em></em>3000</span></div>

预期产出：

String Post =开发人员

String Company = Vm World

String Salary = 3000

Answer 1

我认为你只需要使用Element.select("span")来获取HTML代码块。

Document doc = Jsoup.parse("<HTML code>");
Elements spans = doc.select("span");
for(Element span: spans) {
    System.out.println(span.text());
}

上述代码的结果：

Developer
Vm World
3000

segregatiton代码：

Element title = doc.select("span[itemprop=title]").first();
Element post = doc.select("span[itemprop=hiringOrganization]").first();
Element salary = doc.select("span[itemprop=baseSalary]").first();
System.out.println(title.text());
System.out.println(post.text());
System.out.println(salary.text());

使用Jsoup解析数据

1 个答案: