Jsoup如何在兄弟姐妹中找到文本的顺序?

时间:2013-06-30 18:03:40

标签: jsoup

我希望结果为apple:1,orange:2,pear:3。点(...)表示其他标签,其数量和名称未知,但在3列中相似。有人可以帮忙吗?感谢。

    <tr>
    <td> 
      <span>
         .....
          <h>apple</h>
         .....
      </span>  
     </td>
     <td> 
       <span>
             .....
              <h>orange</h>
             .....
          </span>
        </td>
        <td> 
          <span>
             .....
              <h>pear</h>
             .....
          </span>
        </td>
   </tr>

1 个答案:

答案 0 :(得分:0)

您可以在任何元素中调用getElementsByTag()来获取给定类型的所有后代元素(element.getElementsByTag("h")将获得所有<h>),然后您可以使用很容易得到订单。

参见示例代码:

import org.jsoup.Jsoup;
import org.jsoup.nodes.*;
import org.jsoup.select.*;
public class JsoupHtmlSiblingsOrder {
    public static void main(String[] args) {
        String html = "<html><body><span>HELLO!</span><table id=\"myTable\"><tbody>        " +
                "<tr><td> <span>                                                           " +
                "         .....                                                            " +
                "         <h>apple</h>                                                     " +
                "         .....                                                            " +
                "         </span>                                                          " +
                "</td><td><span>                                                           " +
                "         .....                                                            " +
                "         <h>orange</h>                                                    " +
                "         .....                                                            " +
                "         </span>                                                          " +
                "</td><td><span>                                                           " +
                "         .....                                                            " +
                "         <h>pear</h>                                                      " +
                "         .....                                                            " +
                "         </span>                                                          " +
                "</td></tr>                                                                " +
                "</tbody></table></body></html>                                            ";
        Document doc = Jsoup.parse(html);
        Element table = doc.getElementById("myTable");;
        Elements hs = table.getElementsByTag("h");
        for (int i = 0; i < hs.size(); i++) {
            Element h = hs.get(i);
            System.out.println(h.text()+":"+(i+1));
        }
    }
}

输出:

apple:1
orange:2
pear:3