想要使用Jsoup从表中获取href和title

时间:2015-08-17 20:54:41

标签: android html parsing jsoup

我想使用Jsoup解析Html表,但是我无法从中获取所需的数据。我想从此表的每一行获取 href title ,但我从表中获取了整个数据。

<table class="FullWidth gv" cellspacing="0" rules="all" border="1" id="ctl00_Body_STUDENT_SSS_ctrl0_COURSE_REGISTRATION" style="border-collapse:collapse;">
    <tr>
        <th scope="col">S#</th>
        <th scope="col">Code</th>
        <th scope="col">Registered Course Title</th>
        <th scope="col">Credits</th>
        <th scope="col">Offered Course Title</th>
        <th scope="col">Class</th>
        <th scope="col">Teacher</th>
        <th scope="col">Fee</th>
        <th scope="col">&nbsp;</th>
    </tr>
    <tr>
        <td class="Center">
                                1</td>
        <td class="NoWrap">GSC 220</td>
        <td class="Width33">Complex Variables &amp; Transforms</td>
        <td class="Center">3</td>
        <td class="Width33">Complex Variables &amp; Transforms</td>
        <td class="NoWrap">BCE-4 (A) MORNING</td>
        <td class="Width33">AMMAR AJMAL</td>
        <td>YES</td>
        <td>
            <a title="Complex Variables &amp; Transforms" class="a" href="Attendance.aspx?COID=21480" target="_blank">Attendance</a>
        </td>
    </tr>
    <tr class="Alternating">
        <td class="Center">
                                2</td>
        <td class="NoWrap">CSC 221</td>
        <td class="Width33">Data Structure and Algorithm</td>
        <td class="Center">3</td>
        <td class="Width33">Data Structure and Algorithm</td>
        <td class="NoWrap">BCE-4 (A) MORNING</td>
        <td class="Width33">ABU BAKAR</td>
        <td>YES</td>
        <td>
            <a title="Data Structure and Algorithm" class="a" href="Attendance.aspx?COID=21478" target="_blank">Attendance</a>
        </td>
    </tr>
    <tr>
        <td class="Center">
                                3</td>
        <td class="NoWrap">CSL 221</td>
        <td class="Width33">Data Structures and Algorithm Lab</td>
        <td class="Center">1</td>
        <td class="Width33">Data Structures and Algorithm Lab</td>
        <td class="NoWrap">BCE-4 (A) MORNING</td>
        <td class="Width33">ABU BAKAR</td>
        <td>YES</td>
        <td>
            <a title="Data Structures and Algorithm Lab" class="a" href="Attendance.aspx?COID=21479" target="_blank">Attendance</a>
        </td>
    </tr>
    <tr class="Alternating">
        <td class="Center">
                                4</td>
        <td class="NoWrap">CSC 220</td>
        <td class="Width33">Database Management System</td>
        <td class="Center">3</td>
        <td class="Width33">Database Management System</td>
        <td class="NoWrap">BCE-4 (A) MORNING</td>
        <td class="Width33">BUSHRA SABIR</td>
        <td>YES</td>
        <td>
            <a title="Database Management System" class="a" href="Attendance.aspx?COID=21481" target="_blank">Attendance</a>
        </td>
    </tr>
    <tr>
        <td class="Center">
                                5</td>
        <td class="NoWrap">CSL 220</td>
        <td class="Width33">Database Management System Lab</td>
        <td class="Center">1</td>
        <td class="Width33">Database Management System Lab</td>
        <td class="NoWrap">BCE-4 (A) MORNING</td>
        <td class="Width33">BUSHRA SABIR</td>
        <td>YES</td>
        <td>
            <a title="Database Management System Lab" class="a" href="Attendance.aspx?COID=21482" target="_blank">Attendance</a>
        </td>
    </tr>
    <tr class="Alternating">
        <td class="Center">
                                6</td>
        <td class="NoWrap">CSC 320</td>
        <td class="Width33">Operating System</td>
        <td class="Center">3</td>
        <td class="Width33">Operating System</td>
        <td class="NoWrap">BCE-4 (A) MORNING</td>
        <td class="Width33">BUSHRA SABIR</td>
        <td>YES</td>
        <td>
            <a title="Operating System" class="a" href="Attendance.aspx?COID=21474" target="_blank">Attendance</a>
        </td>
    </tr>
    <tr>
        <td class="Center">
                                7</td>
        <td class="NoWrap">CSL 320</td>
        <td class="Width33">Operating System Lab</td>
        <td class="Center">1</td>
        <td class="Width33">Operating System Lab</td>
        <td class="NoWrap">BCE-4 (A) MORNING</td>
        <td class="Width33">BUSHRA SABIR</td>
        <td>YES</td>
        <td>
            <a title="Operating System Lab" class="a" href="Attendance.aspx?COID=21475" target="_blank">Attendance</a>
        </td>
    </tr>
    <tr class="gvFooter">
        <td>&nbsp;</td>
        <td>&nbsp;</td>
        <td>&nbsp;</td>
        <td class="Center">15</td>
        <td>&nbsp;</td>
        <td>&nbsp;</td>
        <td>&nbsp;</td>
        <td>&nbsp;</td>
        <td>&nbsp;</td>

我正在尝试这样

 Document doce = Jsoup.connect(urlofthewebsite)
                .cookies(hashMap)
                .get();



Element tableheader = doce.select("table[id=ctl00_Body_STUDENT_SSS_ctrl0_COURSE_REGISTRATION}").first();

for(Element element : tableheader.children())
{
    System.out.println(element.text());
}

1 个答案:

答案 0 :(得分:0)

首先,你的例子有错字

select("table[id=ctl00_Body_STUDENT_SSS_ctrl0_COURSE_REGISTRATION}")

因为您使用}而不是]结束了属性选择器。

id开始使用#identifier代替[id=identifier].className代替[class=className],以避免此类错误。

同样致电

.select("table[id=ctl00_Body_STUDENT_SSS_ctrl0_COURSE_REGISTRATION]")
.first();

你没有从表中获得第一行(如标题),但第一个表使用此id(因为这些元素 - 具有特定id的表 - 您的选择器假设要查找)。
如果您想查找标题,只需选择th标记,例如

即可
Element table = doce.select("table#ctl00_Body_STUDENT_SSS_ctrl0_COURSE_REGISTRATION").first();
for(Element column : table.select("th")) {
    System.out.println(column.text());
}

现在基于

  

我希望从此表的每一行获得 href title ,但我从表中获取了整个数据。

你可能想要使用像

这样的东西
for (Element link : table.select("a")){
    System.out.println(link.attr("title")+" -> "+link.attr("href"));
    //you can also use abs:href to get absolute path
}