获取节点后的内容

时间:2014-01-31 10:17:48

标签: java jsoup

获取网页后,我有以下html块:

<td class="detail" id="ar-content-html">  
<div style="float:right; padding: 10px">  
</div>  
<p>&nbsp;</p>foo<div style="padding: 20px">bar</div> 
</td>  

如何在第一个div标记后获取该块的内容?

所需部分:<p>&nbsp;</p>foo<div style="padding: 20px">bar</div>

3 个答案:

答案 0 :(得分:0)

不要发明自行车,只使用标准String方法:

static final String DIV = "<div style=\"float:right; padding: 10px\">";

String html = .... ; // here is your html content
String part = html.substring(html.indexof(DIV) + DIV.length());
part = part.substring(0, part.indexof("</div>"));

答案 1 :(得分:0)

这有效:

Element td = doc.select("td.detail").first();   

List<Node> tdChildren = td.childNodes();

String p   = tdChildren.get(1).toString(); // will select <p>&nbsp;</p>
String foo = tdChildren.get(2).toString(); // will select foo       
String div = tdChildren.get(3).toString(); // will select <div style="padding: 20px">bar</div>

System.out.println(p + foo + div);

输出:

<p>&nbsp;</p>foo<div style="padding: 20px">
 bar
</div>

答案 2 :(得分:-1)

您可以使用JSoup。可以使用的代码示例如下:

public static void main(String[] args) {

    String demo = "<td class='detail' id='ar-content-html'><div style='float:right; padding: 10px'></div><p>&nbsp;</p>foo<div style='padding: 20px'>bar</div></td>";

    Document document = Jsoup.parse(demo);
    Element options = document.select("div").first();

    Elements siblings = options.siblingElements();

    List<Element> sibling = siblings.subList(1, siblings.size());

    Iterator<Element> sibIterator = sibling.iterator();

    while (sibIterator.hasNext()) {
        System.out.println(sibIterator.next().toString());
    }

}

输出如下:

<p>&nbsp;</p>
<div style="padding: 20px">
 bar
</div>

要打印foo,以下代码也适用:

    List<Node> siblings = options.siblingNodes().subList(1, options.siblingNodes().size());

    Iterator<Node> sibIterator = siblings.iterator();

    while (sibIterator.hasNext()) {
        System.out.println(sibIterator.next().toString());
    }

输出是:

<p>&nbsp;</p>
foo
<div style="padding: 20px">
 bar
</div>