我正在努力做的事情对我来说似乎很简单,但我的挣扎远远超过了我应该做的事情。我有一份文件,其中包含以下内容:
<h2>First Heading</h2>
<table>
<div class="title">First Subheading One</div>
<div class="title">First Subheading Two</div>
<div class="title">First Subheading Three</div>
</table>
<h2>Second Heading</h2>
<table>
<div class="title">Second Subheading One</div>
<div class="title">Second Subheading Two</div>
<div class="title">Second Subheading Three</div>
</table>
<h2>Third Heading</h2>
<table>
<div class="title">Third Subheading One</div>
<div class="title">Third Subheading Two</div>
<div class="title">Third Subheading Three</div>
</table>
使用doc.select(“h2”)给出了所有标题,如预期的那样。使用doc.select(“div.title”)给了我所有的副标题,也是预期的。我想要做的是迭代返回的h2元素,然后在其中,然后遍历返回的div.title元素 - 我尝试了很多东西,而且我根本不是新编码(jsoup的新手)但是,但我似乎无法理解如何做到这一点。
Headings = httpDoc.select("h3");
for(Element Headings : heading) {
// something with heading.nextSibling here
}
我能做些什么(例如nextSibling)给我节点吗?从那里我可以做另一个选择(“div.title”)并迭代这些以获取副标题?
或者我完全以错误的方式解决这个问题?抱歉,如果这看起来很愚蠢 - 感觉有点愚蠢,因为我编码的时间超过了我愿意承认的数年,但是从来没有处理过DOM(一直是Win32的人)。
答案 0 :(得分:4)
我从您的问题中了解到,您正在尝试获取h2
标记,然后针对每个heading <h2>
尝试获取相应的div.title
内容。
h3
而不是 h2
< / strong>,您的HTML代码中没有。<table>
应该有一个<tr>
&amp; <td>
(我认为<td>
是可选的,请查看W3页面)。因此,当您解析HTML代码段 jSoup 只是 prunes/removes
格式错误的<table>
The header is: First Heading
The div content is: First Subheading One
The div content is: First Subheading Two
The div content is: First Subheading Three
========== +_+ ===========
The header is: Second Heading
The div content is: Second Subheading One
The div content is: Second Subheading Two
The div content is: Second Subheading Three
========== +_+ ===========
The header is: Third Heading
The div content is: Third Subheading One
The div content is: Third Subheading Two
The div content is: Third Subheading Three
========== +_+ ===========
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class JSoupTest
{
public static void main(String[] args)
{
String s = "<h2>First Heading</h2>";
s += "<table><tr><td>";
s += "<div class='title'>First Subheading One</div>";
s += "<div class='title'>First Subheading Two</div>";
s += "<div class='title'>First Subheading Three</div>";
s += "</table>";
s += "<h2>Second Heading</h2>";
s += "<table><tr><td>";
s += "<div class='title'>Second Subheading One</div>";
s += "<div class='title'>Second Subheading Two</div>";
s += "<div class='title'>Second Subheading Three</div>";
s += "</td></tr></table>";
s += "<h2>Third Heading</h2>";
s += "<table><tr><td>";
s += "<div class='title'>Third Subheading One</div>";
s += "<div class='title'>Third Subheading Two</div>";
s += "<div class='title'>Third Subheading Three</div>";
s += "</td></tr></table>";
Document doc = Jsoup.parse(s);
Elements h_2 = doc.select("h2");
for(int i=0; i<h_2.size(); i++)
{
Element e = h_2.get(i);
System.out.println("The header is: " + e.ownText());
Element nextSib = e.nextElementSibling();
Elements divs = nextSib.select("div.title");
for(int j=0; j<divs.size(); j++)
{
Element d = divs.get(j);
System.out.println("The div content is: " + d.ownText());
}
System.out.println("========== +_+ ===========");
}
}
}