我正在制作一个Android应用,可以使用大学提供的在线门户时间表来存储学生时间表。
请查看屏幕截图,因为时间表以以下格式显示:
由于无法建立可从网站提取数据的模式,因为每一列和每一行都没有id标签,因此我遇到了一个问题。请参阅以下html代码。如果有人可以定义一个很好的模式。请记住,我将仅为此使用java(android)。欢迎所有建议。
<div class="portlet-body">
<div class="table-responsive">
<table class="table table-light">
<thead>
<tr>
<th> </th>
<th style="text-align: center; color: black">MON</th>
<th style="text-align: center; color: black">TUE</th>
<th style="text-align: center; color: black">WED</th>
<th style="text-align: center; color: black">THU</th>
<th style="text-align: center; color: black">FRI</th>
<th style="text-align: center; color: black">SAT</th>
<th style="text-align: center; color: black">SUN</th>
</tr>
</thead>
<tbody>
<tr>
<td class="label-success" style="color: #fff;">08:00 AM - 09:20 AM</td>
<td style="background-color:#ddd;color:black;text-align: center;border-style: solid;">
<div style="color:black;">Enterprise Application Development Lab(4)<br></div>
<div style="color:gray;">SYED ARSLAN SAEED<br></div>
<div style="color:black;"> [INST LAB-I, B-BLOCK]</div>
</td>
<td> </td>
<td style="background-color:#ddd;color:black;text-align: center;border-style: solid;">
<div style="color:black;">Linear Algebra(3)<br></div>
<div style="color:gray;">SHAHANA RIZVI<br></div>
<div style="color:black;"> [F5]</div>
</td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td class="label-success" style="color: #fff;">09:30 AM - 10:50 AM</td>
<td style="background-color:#ddd;color:black;text-align: center;border-style: solid;">
<div style="color:black;">Enterprise Application Development Lab(4)<br></div>
<div style="color:gray;">SYED ARSLAN SAEED<br></div>
<div style="color:black;"> [INST LAB-I, B-BLOCK]</div>
</td>
<td> </td>
<td style="background-color:#ddd;color:black;text-align: center;border-style: solid;">
<div style="color:black;">Enterprise Application Development(3)<br></div>
<div style="color:gray;">ASAD MAHMOOD<br></div>
<div style="color:black;"> [F4]</div>
</td>
<td style="background-color:#ddd;color:black;text-align: center;border-style: solid;">
<div style="color:black;">Enterprise Application Development(3)<br></div>
<div style="color:gray;">ASAD MAHMOOD<br></div>
<div style="color:black;"> [B9]</div>
</td>
<td style="background-color:#ddd;color:black;text-align: center;border-style: solid;">
<div style="color:black;">Linear Algebra(3)<br></div>
<div style="color:gray;">SHAHANA RIZVI<br></div>
<div style="color:black;"> [E5]</div>
</td>
<td> </td>
<td> </td>
</tr>
<tr>
<td class="label-success" style="color: #fff;">11:00 AM - 12:20 PM</td>
<td style="background-color:#ddd;color:black;text-align: center;border-style: solid;">
<div style="color:black;">Principles of Accounting-I(3)<br></div>
<div style="color:gray;">NOUSHEEN TARIQ BHUTTA<br></div>
<div style="color:black;"> [F6]</div>
</td>
<td> </td>
<td style="background-color:#ddd;color:black;text-align: center;border-style: solid;">
<div style="color:black;">Principles of Accounting-I(3)<br></div>
<div style="color:gray;">NOUSHEEN TARIQ BHUTTA<br></div>
<div style="color:black;"> [B8]</div>
</td>
<td> </td>
<td style="background-color:#ddd;color:black;text-align: center;border-style: solid;">
<div style="color:black;">Mobile Application Development(1)<br></div>
<div style="color:gray;">ANSAR JAVED<br></div>
<div style="color:black;"> [B2]</div>
</td>
<td> </td>
<td> </td>
</tr>
<tr>
<td class="label-success" style="color: #fff;">12:30 PM - 01:50 PM</td>
<td> </td>
<td style="background-color:#ddd;color:black;text-align: center;border-style: solid;">
<div style="color:black;">Mobile Application Development(1)<br></div>
<div style="color:gray;">ANSAR JAVED<br></div>
<div style="color:black;"> [E5]</div>
</td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td class="label-success" style="color: #fff;">02:00 PM - 03:20 PM</td>
<td style="background-color:#ddd;color:black;text-align: center;border-style: solid;">
<div style="color:black;">Artificial Intelligence(2)<br></div>
<div style="color:gray;">AAMER NADEEM<br></div>
<div style="color:black;"> [E4]</div>
</td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td class="label-success" style="color: #fff;">03:30 PM - 04:50 PM</td>
<td> </td>
<td> </td>
<td> </td>
<td style="background-color:#ddd;color:black;text-align: center;border-style: solid;">
<div style="color:black;">Artificial Intelligence(2)<br></div>
<div style="color:gray;">AAMER NADEEM<br></div>
<div style="color:black;"> [B5]</div>
</td>
<td> </td>
<td> </td>
<td> </td>
</tr>
</tbody>
</table>
</div>
</div>
答案 0 :(得分:0)
使用jsoup
Document doc = Jsoup.connect(url).get();
Elements tableElements = doc.select("table");
Elements rows = tableElements.select("tr");
// start from 1, exclude 0 which is a header without td's
for (int i = 1; i < rows.size(); i++) {
Elements cols = rows.get(i).select("td");
// print all cols
for(int j = 0; j < cols.size(); j++){
System.out.println(cols.get(j).text());
}
}