如何在下一个兄弟之后解析HTML的JSOUP?

时间:2016-03-25 15:55:59

标签: java html parsing jsoup nextsibling

拥有HTML:

Document doc = Jsoup.connect("text.html").get();

Element param = doc.select("span[class=param]").get(0);

Node node = param.nextSibling();

System.out.println(node.toString());

我可以得到值Text0-3解析代码更改get(0)-get(3),但是不能得到Text4和Text5:

Document doc = Jsoup.connect("text.hml").get();

        Elements params = doc.select("span[class=param]");
        int i;
        for (i=0; i<6; i++) {
        Element param = params.get(i);

        Node node = param.nextSibling();

        System.out.println(node.toString());

        }

如何获取Text4和Text5的值? get(4)或get(5),现在返回br,但我需要得到&#34;一,二,三&#34;

现在我使用这段代码:

 23
 173
 54
 2
<br>
<br>

此印刷品:

 23
 173
 54
 2
 one two three
 one two three

我需要:

Document doc = Jsoup.connect("text.html").get();

        Elements params = doc.select("span[class=param]");
        int i;
        for (i=0; i<3; i++) {
        Element param = params.get(i);

        Node node = param.nextSibling();

        System.out.println(node.toString());
        }

        for (i=4; i<5; i++){

            Element apar = params.get(i);

            Node apan = apar.nextSibling();

            System.out.println("apar: "+apan.nextSibling().toString());
            System.out.println("apar: "+apan.nextSibling().nextSibling().nextSibling().toString());
            System.out.println("apar: "+apan.nextSibling().nextSibling().nextSibling().nextSibling().nextSibling().toString());
            //System.out.println(apan.nextSibling().toString());


        }
        for (i=5; i<6; i++){

            Element vih = params.get(i);

            Node vihn = vih.nextSibling();

            System.out.println("vih: "+vihn.nextSibling().toString());
            System.out.println("vih: "+vihn.nextSibling().nextSibling().nextSibling().toString());
            System.out.println("vih: "+vihn.nextSibling().nextSibling().nextSibling().nextSibling().nextSibling().toString());
            //System.out.println(apan.nextSibling().toString());


        }

    }

疯狂的代码答案:

select m.id, 
       m.other, 
       x.value::jsonb->>'name' as name, 
       x.value::jsonb->>'value' as value
  from myrecord m, 
       json_array_elements(m.attributes) x;

这个疯狂的(?)代码打印出我想要的内容。

1 个答案:

答案 0 :(得分:0)

当你执行Element param = doc.select("span[class=param]")时,你会得到一个元素列表。 您需要遍历列表以处理每个<span>元素。 在您的代码中,您只需通过Element param = doc.select("span[class=param]").get(0);

抓取一个
Document doc = Jsoup.connect("text.hml").get(); 
Elements params = doc.select("span[class=param]");
for(Element element: params){
    //Will print out the text contained within the <span>...</span>
    System.out.println(element.ownText());
}

params = doc.select("td");
for(Element element: params){
    //Will print out the text contained in all children nodes of <td> nodes, that are text nodes 
    System.out.println(element.ownText());
    //System.out.println(element.text());
}

以上代码将打印出来:

Text0
Text1
Text2
Text3
Text4
Text5
23 173 54 2
one two three
one - two - three -

这应该足以让你到达目的地。祝你好运!