Android:如何使用jsoup在标记之间获取行

时间:2016-03-19 09:20:39

标签: android jsoup

我有以下html

<tr>
    <td colspan="6" class="sumheadtop"> Friday 18 March 2016</td>
</tr>
<tr>
    <td colspan="6" class="sumheadbot">&nbsp;PASSENGER ARRIVALS | DOMESTIC &amp; INTERNATIONAL | All Airlines | ALL OriginS</td>
</tr>
<tr class="schedulerow" style="height:2px"><td colspan="6"></td></tr>
<tr class="schedulerow" valign="top">
    <td class="airline"><img src="/webfids/images/3u.gif" width="100" height="24" vspace="0" alt="Sichuan Airlines"/></td>
    <td class="flight" nowrap>3U 8989</td>
    <td class="city">Chengdu</td>
    <td class="time">19:00</td>
    <td class="estimated">20:01</td>
    <td class="status"><div class="statusone">LANDED</div></td>
</tr>
<tr class="schedulerow" style="height:2px"><td colspan="6"></td></tr>
<tr class="schedulerowtwo" style="height:2px"><td colspan="6"></td></tr>
<tr class="schedulerowtwo" valign="top">
    <td class="airline"><img src="/webfids/images/q2.gif" width="100" height="24" vspace="0" alt="Maldivian"/></td>
    <td class="flight" nowrap>Q2 107</td>
    <td class="city">Gan</td>
    <td class="time">19:35</td>
    <td class="estimated">19:30</td>
    <td class="status"><div class="statusone">LANDED</div></td>
</tr>
<tr>
    <td colspan="6" class="sumheadtop"> Saturday 19 March 2016</td>
</tr>
<tr>
    <td colspan="6" class="sumheadbot">&nbsp;PASSENGER ARRIVALS | DOMESTIC &amp; INTERNATIONAL | All Airlines | ALL OriginS</td>
</tr>
<tr class="schedulerow" style="height:2px"><td colspan="6"></td></tr>
<tr class="schedulerow" valign="top">
    <td class="airline"><img src="/webfids/images/3u.gif" width="100" height="24" vspace="0" alt="Sichuan Airlines"/></td>
    <td class="flight" nowrap>3U 8989</td>
    <td class="city">Chengdu</td>
    <td class="time">19:00</td>
    <td class="estimated">20:01</td>
    <td class="status"><div class="statusone">LANDED</div></td>
</tr>
<tr class="schedulerow" style="height:2px"><td colspan="6"></td></tr>
<tr class="schedulerowtwo" style="height:2px"><td colspan="6"></td></tr>
<tr class="schedulerowtwo" valign="top">
    <td class="airline"><img src="/webfids/images/q2.gif" width="100" height="24" vspace="0" alt="Maldivian"/></td>
    <td class="flight" nowrap>Q2 107</td>
    <td class="city">Gan</td>
    <td class="time">19:35</td>
    <td class="estimated">19:30</td>
    <td class="status"><div class="statusone">LANDED</div></td>
</tr>

我希望得到两个&#34; sumheadtop&#34;之间的行。类。

我如何使用Jsoup

实现这一目标

我尝试使用下面的代码但是我得到了第一个&#34; sumheadtop&#34;以下的所有行类

doc = Jsoup.parse(html);
date = doc.select("td[class=sumheadtop]");
siblings = date.first().parent().siblingElements();

1 个答案:

答案 0 :(得分:0)

试试这个:

String html = "<table><tr>\n <td colspan=\"6\" class=\"sumheadtop\"> Friday 18 March 2016</td>\n</tr>\n<tr>\n <td colspan=\"6\" class=\"sumheadbot\">&nbsp;PASSENGER ARRIVALS | DOMESTIC &amp; INTERNATIONAL | All Airlines | ALL OriginS</td>\n</tr>\n<tr class=\"schedulerow\" style=\"height:2px\"><td colspan=\"6\"></td></tr>\n<tr class=\"schedulerow\" valign=\"top\">\n <td class=\"airline\"><img src=\"/webfids/images/3u.gif\" width=\"100\" height=\"24\" vspace=\"0\" alt=\"Sichuan Airlines\"/></td>\n <td class=\"flight\" nowrap>3U 8989</td>\n <td class=\"city\">Chengdu</td>\n <td class=\"time\">19:00</td>\n <td class=\"estimated\">20:01</td>\n <td class=\"status\"><div class=\"statusone\">LANDED</div></td>\n</tr>\n<tr class=\"schedulerow\" style=\"height:2px\"><td colspan=\"6\"></td></tr>\n<tr class=\"schedulerowtwo\" style=\"height:2px\"><td colspan=\"6\"></td></tr>\n<tr class=\"schedulerowtwo\" valign=\"top\">\n <td class=\"airline\"><img src=\"/webfids/images/q2.gif\" width=\"100\" height=\"24\" vspace=\"0\" alt=\"Maldivian\"/></td>\n <td class=\"flight\" nowrap>Q2 107</td>\n <td class=\"city\">Gan</td>\n <td class=\"time\">19:35</td>\n <td class=\"estimated\">19:30</td>\n <td class=\"status\"><div class=\"statusone\">LANDED</div></td>\n</tr>\n<tr>\n <td colspan=\"6\" class=\"sumheadtop\"> Saturday 19 March 2016</td>\n</tr>\n<tr>\n <td colspan=\"6\" class=\"sumheadbot\">&nbsp;PASSENGER ARRIVALS | DOMESTIC &amp; INTERNATIONAL | All Airlines | ALL OriginS</td>\n</tr>\n<tr class=\"schedulerow\" style=\"height:2px\"><td colspan=\"6\"></td></tr>\n<tr class=\"schedulerow\" valign=\"top\">\n <td class=\"airline\"><img src=\"/webfids/images/3u.gif\" width=\"100\" height=\"24\" vspace=\"0\" alt=\"Sichuan Airlines\"/></td>\n <td class=\"flight\" nowrap>3U 8989</td>\n <td class=\"city\">Chengdu</td>\n <td class=\"time\">19:00</td>\n <td class=\"estimated\">20:01</td>\n <td class=\"status\"><div class=\"statusone\">LANDED</div></td>\n</tr>\n<tr class=\"schedulerow\" style=\"height:2px\"><td colspan=\"6\"></td></tr>\n<tr class=\"schedulerowtwo\" style=\"height:2px\"><td colspan=\"6\"></td></tr>\n<tr class=\"schedulerowtwo\" valign=\"top\">\n <td class=\"airline\"><img src=\"/webfids/images/q2.gif\" width=\"100\" height=\"24\" vspace=\"0\" alt=\"Maldivian\"/></td>\n <td class=\"flight\" nowrap>Q2 107</td>\n <td class=\"city\">Gan</td>\n <td class=\"time\">19:35</td>\n <td class=\"estimated\">19:30</td>\n <td class=\"status\"><div class=\"statusone\">LANDED</div></td>\n</tr></table>";

Document doc = Jsoup.parse(html);
Element firstDateCell = doc.select("td.sumheadtop").first();

if (firstDateCell == null) {
    throw new RuntimeException("Unable to locate rows...");
}

System.out.println(firstDateCell.text());
for (Element aRow : firstDateCell.parent().siblingElements()) {
    if (!aRow.select("td.sumheadtop").isEmpty()) {
        System.out.println(aRow.text());
    } else {
        // Handle the row now...
        System.out.println(">> " + aRow.text());
    }
}

输出

Friday 18 March 2016
>>  PASSENGER ARRIVALS | DOMESTIC & INTERNATIONAL | All Airlines | ALL OriginS
>> 
>> 3U 8989 Chengdu 19:00 20:01 LANDED
>> 
>> 
>> Q2 107 Gan 19:35 19:30 LANDED
Saturday 19 March 2016
>>  PASSENGER ARRIVALS | DOMESTIC & INTERNATIONAL | All Airlines | ALL OriginS
>> 
>> 3U 8989 Chengdu 19:00 20:01 LANDED
>> 
>> 
>> Q2 107 Gan 19:35 19:30 LANDED

但是,上面的代码显示Saturday 19 March 2016之后的行。您可以添加一个中断来防止这种情况发生。