我有一个要求,我必须在每个表格之前提取段落以及表格内容形成一个网站。
我能够轻松地使用jsoup提取表数据,但无法提取在表之前发生的段落。 我试过以下事情: -
1. doc.select("p") but its giving extra values because some text in table columns are also in <p> tag.
2. getElementsByTag but no luck.
样本表:
<p>
<a id="table heading" name="table name"></a>
<b>Sports equipments</b>
</p>
<table width="98%" cellpadding="0" border="1">
<tbody>
<tr valign="top" bgcolor="#ffffcc" align="left">
<th width="25%" scope="col">Company</th>
<th width="25%" scope="col">Product</th>
<th width="20%" scope="col">Availability</th>
<th width="55%" scope="col">Related Information</th>
<th width="20%" scope="col">
</tr>
<tr>
<td width="18%" valign="top" rowspan="2">
<div>
Nike
<br>
1-800-545-8800
<br>
<br>
<br>
</div>
</td>
<td width="10%" valign="top">
<div>sports kit</div>
</td>
<td width="15%" valign="top" rowspan="2">
<div>Available</div>
</td>
<td width="24%" valign="top" rowspan="2">
<div>Product is available and shipping.</div>
</td>
<td width="16%" valign="top" rowspan="2">Demand increase.</td>
<td width="12%" valign="top" rowspan="2">
<div>
<div>3/26/2014</div>
</td>
</tr>
</table>
我必须提取:
<b>Sports equipments</b>
以及表格内容
答案 0 :(得分:0)
您可以将选择器扩展为:"p > b"
。
由于我没有你的完整HTML,很难说它是否可以在那里工作,但是你的例子确实如此:
final String html = ... // the html of your example
Document doc = Jsoup.parse(html);
/*
* Selects b-tags, that are direct childs of p-tags.
*/
for( Element element : doc.select("p > b") )
{
System.out.println(element);
}
打印:
<b>Sports equipments</b>
答案 1 :(得分:0)
Document doc = Jsoup.connect(html).get();
Elements table = doc.select("table”);
for (int i = 0; i < table.size(); i++) {
Element tablevalue = table.get(i);
Element para = tablevalue.previousElementSibling();
System.out.println(para.text());
}