在锚JSoup中读取数据

时间:2017-05-16 08:27:26

标签: java jsoup

我正在编写一段代码,我想从同一个tr中获取td字符串和锚文本:

<tr>
  <td class='labelOptional_1'>TD1 text here</td>
  <td width='15%' class='label2'><div align='center'>&nbsp;</div></td>
  <td width='15%' class='label2'><div align='center'>&nbsp;</div></td>
  <td width='15%' class='label2'>
    <div align='center'> <a href='Relative_URL_1'>hrefURL 1 in anchor tag ||</a> </div>
  </td>
<tr>
   <td class='labelOptional_1'>TD2 text here</td>
   <td width='15%' class='label2'><div align='center'>&nbsp;</div></td>
   <td width='15%' class='label2'><div align='center'>&nbsp;</div></td>
   <td width='15%' class='label2'>
      <div align='center'> <a href='Relative_URL_2'>hrefURL 2 in anchor tag ||</a> </div>
   </td>
 </tr>
 <tr>
   <td class='labelOptional_1'>TD3 here</td>
   <td width='15%' class='label2'><div align='center'>&nbsp;</div></td>
   <td width='15%' class='label2'><div align='center'>&nbsp;</div></td>
   <td width='15%' class='label2'>
      <div align='center'> <a href='Relative_URL_3'>hrefURL 3 in anchor tag ||</a> </div>
   </td>
</tr>

我希望输出为:

TD1 text here Relative_URL_1
TD2 text here Relative_URL_2

当前输出:

TD1 text here Relative_URL_1
TD2 text here Relative_URL_2
TD3 text here Relative_URL_3

以下是代码:

org.jsoup.select.Elements trs = doc.select("tr:contains(text)");        //fetch table rows
        for(Element tr :trs)
        {
            org.jsoup.select.Elements tds = tr.select("td:containsOwn(text)");
            for (Element td:tds){
                        sb.append(td.text());
                        sb.append(',');
            }
            org.jsoup.select.Elements anchor = tr.select("a");
            for(Element aHref : anchor){
                sb.append(aHref.attr("abs:href"));
                sb.append(',');
             }
           sb.append('\n');
         }

代码读取所需的TD,但是,读取外部的所有Anchor标记 tr,td,它在TD中计算包含条件(具有“文本”字)。 我希望代码只读取属于那个特定的锚标签 <tr>

1 个答案:

答案 0 :(得分:1)

删除第二个循环,修改代码如下

      Document doc = Jsoup.parse(html, "", Parser.htmlParser());
    org.jsoup.select.Elements trs = doc.select("tr:contains(text)");        //fetch table rows

    StringBuilder sb = new StringBuilder();
    for (Element tr : trs) {
        org.jsoup.select.Elements tds = tr.select("td:containsOwn(text)");

        for (Element td : tds) {
            String anchor = tr.select("a").attr("href");
            sb.append(td.text() +" "+anchor);
        }
        sb.append('\n');
    }