JSOUP选择结束标记之后的所有文本,直到指定的标记

时间:2013-02-27 22:43:52

标签: html parsing text extract jsoup

我在表中的很多表行中都有这个html:

.........
<tr class="greycellodd" align="right">
<td align="left">
<input type="checkbox" name="cashInvestment" value="100468057"/>
</td>
<td align="left">Cardcash 
</td>
<td class="nobr">26 Aug 10</td>
<td class="nobr"> 1.00 
</td>
<td class="nobr"> 1.00 
</td>
<td align="right">£</td>
<td class="nobr">0.00 </td>
<td class="nobr">0.00 </td>
<td class="nobr">
<span class="changeupsmall">1.00 </span>
</td>
</tr>
<tr class="greycellodd">
<td align="right"/>
<td class="nobr" colspan="8">VISA</td>
</tr>
<tr class="greycelleven" align="right">
<td align="left">
<input type="checkbox" name="cashInvestment" value="100480214"/>
</td>
<td align="left">Santander
</td>
<td class="nobr">24 Sep 11</td>
<td class="nobr"> 1.00 
.......

我需要在每个复选框标记之间提取所有内容

<input type="checkbox" name="cashInvestment" ../> 

实施例

Elemtent 1:

Cardcash 
26 Aug 10
1.00 
1.00 
£
0.00
0.00
1.00
VISA

元素2:

Santander
24 Sep 11
1.00 
.......

我试过了:

 Elements Inve = mainFirst.select("input ~ *" );

 Elements Inve = doc.select("input"); // gives me nothing as there is no text to the input tag (it has no child). 

我还需要获取复选框的值,我知道该怎么做,但如果可能的话,同时做的很好:

Elements mainTables = doc.select("table.maintable");
for (Element subTable : mainTables){    
  Elements borrowInve = subTable.select("input[type=checkbox][name=cashInvestment]" );
  String attr = test.attr("value");
}

由于

编辑:通过检查尺寸解决:

    Elements td = tableRows.get(i).select("td");

            Elements cash = tableRows.get(i).getElementsByAttributeValue("name", attrValue); // check if checkbox is present
            int theSize = cash.size();

            if(theSize ==1){ // this row is not a comment 

                String checkbox = "";
                Element cbox = td.select("input[type=checkbox]" ).first();
                checkbox = cbox.attr("value");
             else if (theSize ==0){ // this row contains a comment
                                  .............

1 个答案:

答案 0 :(得分:1)

我从来没有在JSOUP中做过任何事情,但是快速浏览一下这些文档,可能会有以下几点:

Elements Inve = doc.select(".maintable tr td:not(:has(input))");

虽然如果你可以为你想要定位的元素添加一个类,可能会更容易。