我正在编写一段代码,我想从同一个tr中获取td字符串和锚文本:
<tr>
<td class='labelOptional_1'>TD1 text here</td>
<td width='15%' class='label2'><div align='center'> </div></td>
<td width='15%' class='label2'><div align='center'> </div></td>
<td width='15%' class='label2'>
<div align='center'> <a href='Relative_URL_1'>hrefURL 1 in anchor tag ||</a> </div>
</td>
<tr>
<td class='labelOptional_1'>TD2 text here</td>
<td width='15%' class='label2'><div align='center'> </div></td>
<td width='15%' class='label2'><div align='center'> </div></td>
<td width='15%' class='label2'>
<div align='center'> <a href='Relative_URL_2'>hrefURL 2 in anchor tag ||</a> </div>
</td>
</tr>
<tr>
<td class='labelOptional_1'>TD3 here</td>
<td width='15%' class='label2'><div align='center'> </div></td>
<td width='15%' class='label2'><div align='center'> </div></td>
<td width='15%' class='label2'>
<div align='center'> <a href='Relative_URL_3'>hrefURL 3 in anchor tag ||</a> </div>
</td>
</tr>
我希望输出为:
TD1 text here Relative_URL_1
TD2 text here Relative_URL_2
当前输出:
TD1 text here Relative_URL_1
TD2 text here Relative_URL_2
TD3 text here Relative_URL_3
以下是代码:
org.jsoup.select.Elements trs = doc.select("tr:contains(text)"); //fetch table rows
for(Element tr :trs)
{
org.jsoup.select.Elements tds = tr.select("td:containsOwn(text)");
for (Element td:tds){
sb.append(td.text());
sb.append(',');
}
org.jsoup.select.Elements anchor = tr.select("a");
for(Element aHref : anchor){
sb.append(aHref.attr("abs:href"));
sb.append(',');
}
sb.append('\n');
}
代码读取所需的TD,但是,读取外部的所有Anchor标记
tr,td,它在TD中计算包含条件(具有“文本”字)。
我希望代码只读取属于那个特定的锚标签
<tr>
答案 0 :(得分:1)
删除第二个循环,修改代码如下
Document doc = Jsoup.parse(html, "", Parser.htmlParser());
org.jsoup.select.Elements trs = doc.select("tr:contains(text)"); //fetch table rows
StringBuilder sb = new StringBuilder();
for (Element tr : trs) {
org.jsoup.select.Elements tds = tr.select("td:containsOwn(text)");
for (Element td : tds) {
String anchor = tr.select("a").attr("href");
sb.append(td.text() +" "+anchor);
}
sb.append('\n');
}