获取网页后,我有以下html块:
<td class="detail" id="ar-content-html">
<div style="float:right; padding: 10px">
</div>
<p> </p>foo<div style="padding: 20px">bar</div>
</td>
如何在第一个div
标记后获取该块的内容?
所需部分:<p> </p>foo<div style="padding: 20px">bar</div>
答案 0 :(得分:0)
不要发明自行车,只使用标准String
方法:
static final String DIV = "<div style=\"float:right; padding: 10px\">";
String html = .... ; // here is your html content
String part = html.substring(html.indexof(DIV) + DIV.length());
part = part.substring(0, part.indexof("</div>"));
答案 1 :(得分:0)
这有效:
Element td = doc.select("td.detail").first();
List<Node> tdChildren = td.childNodes();
String p = tdChildren.get(1).toString(); // will select <p> </p>
String foo = tdChildren.get(2).toString(); // will select foo
String div = tdChildren.get(3).toString(); // will select <div style="padding: 20px">bar</div>
System.out.println(p + foo + div);
输出:
<p> </p>foo<div style="padding: 20px">
bar
</div>
答案 2 :(得分:-1)
您可以使用JSoup。可以使用的代码示例如下:
public static void main(String[] args) {
String demo = "<td class='detail' id='ar-content-html'><div style='float:right; padding: 10px'></div><p> </p>foo<div style='padding: 20px'>bar</div></td>";
Document document = Jsoup.parse(demo);
Element options = document.select("div").first();
Elements siblings = options.siblingElements();
List<Element> sibling = siblings.subList(1, siblings.size());
Iterator<Element> sibIterator = sibling.iterator();
while (sibIterator.hasNext()) {
System.out.println(sibIterator.next().toString());
}
}
输出如下:
<p> </p>
<div style="padding: 20px">
bar
</div>
要打印foo,以下代码也适用:
List<Node> siblings = options.siblingNodes().subList(1, options.siblingNodes().size());
Iterator<Node> sibIterator = siblings.iterator();
while (sibIterator.hasNext()) {
System.out.println(sibIterator.next().toString());
}
输出是:
<p> </p>
foo
<div style="padding: 20px">
bar
</div>