我想使用Jsoup解析Html表,但是我无法从中获取所需的数据。我想从此表的每一行获取 href 和 title ,但我从表中获取了整个数据。
<table class="FullWidth gv" cellspacing="0" rules="all" border="1" id="ctl00_Body_STUDENT_SSS_ctrl0_COURSE_REGISTRATION" style="border-collapse:collapse;">
<tr>
<th scope="col">S#</th>
<th scope="col">Code</th>
<th scope="col">Registered Course Title</th>
<th scope="col">Credits</th>
<th scope="col">Offered Course Title</th>
<th scope="col">Class</th>
<th scope="col">Teacher</th>
<th scope="col">Fee</th>
<th scope="col"> </th>
</tr>
<tr>
<td class="Center">
1</td>
<td class="NoWrap">GSC 220</td>
<td class="Width33">Complex Variables & Transforms</td>
<td class="Center">3</td>
<td class="Width33">Complex Variables & Transforms</td>
<td class="NoWrap">BCE-4 (A) MORNING</td>
<td class="Width33">AMMAR AJMAL</td>
<td>YES</td>
<td>
<a title="Complex Variables & Transforms" class="a" href="Attendance.aspx?COID=21480" target="_blank">Attendance</a>
</td>
</tr>
<tr class="Alternating">
<td class="Center">
2</td>
<td class="NoWrap">CSC 221</td>
<td class="Width33">Data Structure and Algorithm</td>
<td class="Center">3</td>
<td class="Width33">Data Structure and Algorithm</td>
<td class="NoWrap">BCE-4 (A) MORNING</td>
<td class="Width33">ABU BAKAR</td>
<td>YES</td>
<td>
<a title="Data Structure and Algorithm" class="a" href="Attendance.aspx?COID=21478" target="_blank">Attendance</a>
</td>
</tr>
<tr>
<td class="Center">
3</td>
<td class="NoWrap">CSL 221</td>
<td class="Width33">Data Structures and Algorithm Lab</td>
<td class="Center">1</td>
<td class="Width33">Data Structures and Algorithm Lab</td>
<td class="NoWrap">BCE-4 (A) MORNING</td>
<td class="Width33">ABU BAKAR</td>
<td>YES</td>
<td>
<a title="Data Structures and Algorithm Lab" class="a" href="Attendance.aspx?COID=21479" target="_blank">Attendance</a>
</td>
</tr>
<tr class="Alternating">
<td class="Center">
4</td>
<td class="NoWrap">CSC 220</td>
<td class="Width33">Database Management System</td>
<td class="Center">3</td>
<td class="Width33">Database Management System</td>
<td class="NoWrap">BCE-4 (A) MORNING</td>
<td class="Width33">BUSHRA SABIR</td>
<td>YES</td>
<td>
<a title="Database Management System" class="a" href="Attendance.aspx?COID=21481" target="_blank">Attendance</a>
</td>
</tr>
<tr>
<td class="Center">
5</td>
<td class="NoWrap">CSL 220</td>
<td class="Width33">Database Management System Lab</td>
<td class="Center">1</td>
<td class="Width33">Database Management System Lab</td>
<td class="NoWrap">BCE-4 (A) MORNING</td>
<td class="Width33">BUSHRA SABIR</td>
<td>YES</td>
<td>
<a title="Database Management System Lab" class="a" href="Attendance.aspx?COID=21482" target="_blank">Attendance</a>
</td>
</tr>
<tr class="Alternating">
<td class="Center">
6</td>
<td class="NoWrap">CSC 320</td>
<td class="Width33">Operating System</td>
<td class="Center">3</td>
<td class="Width33">Operating System</td>
<td class="NoWrap">BCE-4 (A) MORNING</td>
<td class="Width33">BUSHRA SABIR</td>
<td>YES</td>
<td>
<a title="Operating System" class="a" href="Attendance.aspx?COID=21474" target="_blank">Attendance</a>
</td>
</tr>
<tr>
<td class="Center">
7</td>
<td class="NoWrap">CSL 320</td>
<td class="Width33">Operating System Lab</td>
<td class="Center">1</td>
<td class="Width33">Operating System Lab</td>
<td class="NoWrap">BCE-4 (A) MORNING</td>
<td class="Width33">BUSHRA SABIR</td>
<td>YES</td>
<td>
<a title="Operating System Lab" class="a" href="Attendance.aspx?COID=21475" target="_blank">Attendance</a>
</td>
</tr>
<tr class="gvFooter">
<td> </td>
<td> </td>
<td> </td>
<td class="Center">15</td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
我正在尝试这样
Document doce = Jsoup.connect(urlofthewebsite)
.cookies(hashMap)
.get();
Element tableheader = doce.select("table[id=ctl00_Body_STUDENT_SSS_ctrl0_COURSE_REGISTRATION}").first();
for(Element element : tableheader.children())
{
System.out.println(element.text());
}
答案 0 :(得分:0)
首先,你的例子有错字
select("table[id=ctl00_Body_STUDENT_SSS_ctrl0_COURSE_REGISTRATION}")
因为您使用}
而不是]
结束了属性选择器。
id
开始使用#identifier
代替[id=identifier]
和.className
代替[class=className]
,以避免此类错误。
同样致电
.select("table[id=ctl00_Body_STUDENT_SSS_ctrl0_COURSE_REGISTRATION]")
.first();
你没有从表中获得第一行(如标题),但第一个表使用此id
(因为这些元素 - 具有特定id的表 - 您的选择器假设要查找)。
如果您想查找标题,只需选择th
标记,例如
Element table = doce.select("table#ctl00_Body_STUDENT_SSS_ctrl0_COURSE_REGISTRATION").first();
for(Element column : table.select("th")) {
System.out.println(column.text());
}
现在基于
我希望从此表的每一行获得 href 和 title ,但我从表中获取了整个数据。
你可能想要使用像
这样的东西for (Element link : table.select("a")){
System.out.println(link.attr("title")+" -> "+link.attr("href"));
//you can also use abs:href to get absolute path
}