我正在尝试创建一个程序,它将读取特定网站中的表格,并可以从中获取我需要的数据。 我读到了关于jsoup和元素的内容,并试图实现我读到的东西,但是对我来说缺少了一些东西 我的表格是HTML代码
<tr>
<td valign="top" width="980px;">
<!-- START WARRANTY RESULTS -->
<!-- START warrantyResultsDetails -->
<table class="ibm-data-table" summary="Warranty results" border="0" cellpadding="0" cellspacing="0">
<caption><b>Warranty information</b></caption>
<thead>
<tr>
<th scope="col">Type</th>
<th scope="col">Model</th>
<th scope="col">Serial number</th>
</tr>
</thead>
<tbody>
<tr>
<td>8205</td>
<td>E6C</td>
<td>06202ET</td>
</tr>
</tbody>
<thead>
<tr>
<th scope="col">Warranty status</th>
<th scope="col">Expiration date</th>
<th scope="col">Location</th>
</tr>
</thead>
<tbody>
<tr>
<td>
Out of warranty <img src="//1.www.s81c.com/i/v17/icons/_icons/ibm_icon_blue_close.png" alt="" align="middle" height="16" width="16">
</td>
<td>2015-12-26</td>
<td>ISRAEL</td>
</tr>
<tr>
<td colspan="3">
<b>Warranty description</b>
<br>
This product has a 3 year limited warranty and is entitled to CRU (customer replaceable unit) and On-site labor repair service for selected parts. On-site Service is available Monday - Friday, except holidays, with a next business day response objective. A service technician will be scheduled to arrive at the customer's location on the business day after remote problem determination.
</td>
</tr>
</tbody>
<thead>
<tr>
<th scope="col" colspan="3">Additional agreement</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="3">
<b>
This web site provides standard warranty or eServicePac information, please consult your local IBM representative or your reseller for other maintenance services or warranty information specific to your IBM Machine.
</b>
</td>
</tr>
</tbody>
</table>
<!-- END warrantyResultsDetails -->
<!-- END PARTS -->
</td>
</tr>
我尝试使用在stackover中编写的代码,但无法正确修改
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Test {
public static void main(String[] args) throws Exception {
String url = "https://www-947.ibm.com/support/entry/portal/wlup?type=8205&serial=06202ET";
Document document = Jsoup.connect(url).get();
String question = document.select("#ibm-data-table").text();
System.out.println("Question: " + question);
Elements answerers = document.select("#answers .user-details a");
for (Element answerer : answerers) {
System.out.println("Answerer: " + answerer.text());
}
}
}
这是另一个代码,它为我提供了所有数据,但我仍希望从表中获取特定的数据而不是所有数据
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class TableEg {
public static void main(String[] args) {
String html = "https://www-947.ibm.com/support/entry/portal/wlup?type=8205&serial=06202ET";
try {
Document doc = Jsoup.connect(html).get();
Elements tableElements = doc.select("table");
Elements tableHeaderEles = tableElements.select("thead tr th");
System.out.println("headers");
for (int i = 0; i < tableHeaderEles.size(); i++) {
System.out.println(tableHeaderEles.get(i).text());
}
System.out.println();
Elements tableRowElements = tableElements.select("tr");
for (int i = 0; i < tableRowElements.size(); i++) {
Element row = tableRowElements.get(i);
System.out.println("row");
Elements rowItems = row.select("td");
for (int j = 0; j < rowItems.size(); j++) {
System.out.println(rowItems.get(j).text());
}
System.out.println();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
答案 0 :(得分:0)
真的不是Java开发者,但是这里......
在最后一个循环中,你有&#39; rowItems&#39;,它是屏幕上所有td元素的列表 - 所以现在你需要一种方法可靠地搜索这些数据以获得你想要的数据位。我猜您无法控制这些表格,因此您只需在您正在跟踪的td上设置一个ID,这样您就可以按ID进行搜索。
如果格式始终保持不变,请弄清楚如何拉出&#39; rowItems&#39;按索引。无论何时返回,索引都应该是相同的数据。希望能指出你正确的方向!