尝试使用JSOUP在嵌套的DIV标记内检索SPAN标记

时间:2016-10-16 12:09:37

标签: xml-parsing jsoup html-parsing

您好我正在尝试使用JSoup提取嵌套DIV标记中的span标记。下面的代码只是一个更大的代码片段。



<div class="formitem formgroup horizontal">
  <div class="formitem formgroup horizontal">
    <div class="formitem formgroup vertical" style="width:325px">
      <div class="formitem formgroup horizontal">
        <div class="formitem formgroup vertical" style="width:325px;">
          <div class="formitem formgroup horizontal">
            <span class="formitem formfield">
                            <span class="value" style="font-weight:bold">47 Lower River St</span>
            </span>
            <span class="formitem formfield">
                            <span class="value" style="font-weight:bold">531</span>
            </span>
          </div>
        </div>
      </div>
      <div class="formitem formgroup horizontal">
        <span class="formitem formfield">
                    <span class="value" style="font-weight:bold">Toronto</span>
        </span>
        <span class="formliteral formitem" />
        <span class="formitem formfield">
                    <span class="value">Ontario</span>
        </span>
        <span class="formliteral formitem" />
        <span class="formitem formfield">
                    <span class="value">M5A0G1</span>
        </span>
      </div>
    </div>
    <div class="formitem formgroup vertical" style="width:150px;">
      <div class="formitem formgroup horizontal">
        <span class="formitem formfield">
                    <label>List:</label>
                    <span class="value" style="font-weight:bold">$279,900</span>
        </span>
        <span class="formitem formfield">
                    <label>For:</label>
                    <span class="value" style="font-weight:bold">Sale</span>
        </span>
      </div>
    </div>
  </div>
  <span class="formitem formfield">
        <span class="value">Toronto C08</span>
  </span>
  <span class="formliteral formitem" />
  <span class="formitem formfield">
        <span class="value">Moss Park</span>
  </span>
  <span class="formliteral formitem" />
  <span class="formitem formfield">
        <span class="value">Toronto</span>
  </span>
  <span class="formitem formfield">
        <span class="value">120-21-S</span>
  </span>
</div>
&#13;
&#13;
&#13;

我正在尝试提取最后一个SPAN标签中的文本(Toronto C08,Moss Park,Toronto和120-21-S)

&#13;
&#13;
<span class="formitem formfield">
    <span class="value">Toronto C08</span>
</span>
<span class="formliteral formitem" />
<span class="formitem formfield">
    <span class="value">Moss Park</span>
</span>
<span class="formliteral formitem" />
<span class="formitem formfield">
    <span class="value">Toronto</span>
</span>
<span class="formitem formfield">
    <span class="value">120-21-S</span>
</span>
&#13;
&#13;
&#13;

我已经成功解析了文档的其他部分,但我似乎无法隔离这些跨度。代码片段来自更大的页面(full page)。我可能正在使用错误的方法,但这是我用来捕获父DIV之间的跨度(在帖子顶部的结果)。

Elements elements = doc.select("div[class=formitem legacyBorder formgroup vertical]");
        Element zoneElement = elements.select("div[class=formitem formgroup vertical")
                 .select("[style=width:500px]").select("div[class=formitem formgroup horizontal").first();

所以现在我有第一个元素,但我需要在所选代码块末尾的最后6个span标记。感谢

1 个答案:

答案 0 :(得分:0)

打开浏览器的developer tool(F12),选择“检查元素”工具,突出显示所需的字段(例如TORONTO C08)并选择其css selector。对于“TORONTO C08”,它将是:

#C3627690 > div:nth-child(3) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > span:nth-child(2) > span:nth-child(1)

对所有其他元素做同样的事情。获得所有选择器后,仔细检查它们 - 也许它们有一个共同的模式(例如,只有第3个值不同),所以你可以用循环迭代它们。