我正在研究一种工具而且我已经完成了最后一步,但是我遇到了一个小问题,请您能给我一个提示。 我有这3个表,我只能从前2个获得数据,如何才能到达第三个表格,升级保修和服务信息?
这是表格代码:
<body>
<div id="ibm-pcon">
<div id="ibm-content">
<div id="ibm-leadspace-head" class="ibm-alternate">
<div id="ibm-leadspace-body">
<br></br>
<script type="text/javascript">currentDate();</script>
<br></br>
<!--BEGIN OPTIONAL BREADCRUMBING--> <span style="font-size: small;"><a href="/pc/entitle/pg2/Service.wss/display/MachineHome">Machine Lookup</a> > <a href="/pc/entitle/pg2/Service.wss/mts/Lookup">Warranty Information</a> > </span>
<!--END OPTIONAL BREADCRUMBING-->
<br></br>
<h1>PEW | Warranty Information</h1>
</div>
</div>
<!-- CONTENT_BODY -->
<div id="ibm-content-body">
<div id="ibm-content-main">
<!-- LEADSPACE_BEGIN -->
<!-- This section can be used to test JavaScript and CSS before promoting the data to the template XML. -->
<table class="ibm-results-table" summary="output table" cellpadding="0" cellspacing="0" border="0"><tbody xmlns="http://www.w3.org/TR/xhtml1/">
<thead>
<tr>
<th scope="col" class="pg2OutputTableSectionTitle">Results of Machine Type/Serial Number Query</th>
</tr>
</thead>
<tr>
<td><table class="ibm-data-table ibm-alternating" summary="output table" cellpadding="0" cellspacing="0" border="0"><tbody>
<thead>
<tr>
<th scope="col" colspan="3" class="pg2TableSectionTitle">General Machine Information:</th>
</tr>
</thead>
<tr>
<td>
Type:
<span>1746</span>
</td><td>
Model:
<span>C4A</span>
</td><td>
Serial:
<span>13D06MK</span>
</td>
</tr>
<tr>
<td>
Status:
<span>Proof Of Purchase Rcvd</span>
</td><td>
Build Date:
<span> </span>
</td><td>
Build to Model:
<span> </span>
</td>
</tr>
<tr>
<td>
Geography:
<span>EMEA</span>
</td><td>
Country:
<span>GREECE</span>
</td><td>
Configuration Id:
<span> </span>
</td>
</tr>
<tr>
<td>
OES Order Number:
<span>2076804957</span>
</td><td>
Customer Number:
<span>108401</span>
</td><td>
Delivery Number:
<span>8519501492</span>
</td>
</tr>
<tr>
<td colspan="2">
Service Status:
<span>This machine is currently out of warranty.</span>
</td><td colspan="1">
UAR End Date:
<span>2012-08-02</span>
</td>
</tr>
</tbody></table></td>
</tr>
<tr>
<td><table class="ibm-data-table ibm-alternating" summary="output table" cellpadding="0" cellspacing="0" border="0"><tbody>
<thead>
<tr>
<th scope="col" colspan="3" class="pg2TableSectionTitle">Warranty and Service Information:</th>
</tr>
</thead>
<tr>
<th scope="col">Start Date</th><th scope="col">End Date</th><th scope="col">SDF</th>
</tr>
<tr>
<td>2012-07-04</td><td>2015-07-03</td><td>3XL</td>
</tr>
<tr>
<td colspan="3">
SDF Description:
<span>This product has a 3 year limited warranty and is entitled to CRU (customer replaceable unit) and On-site service. Tier 1 CRUs are customer responsibility, see announcement for details. On-site Service is available Monday - Friday, except holidays, with a next business day response objective.</span>
</td>
</tr>
</tbody></table></td>
</tr>
<tr>
<td><table class="ibm-data-table ibm-alternating" summary="output table" cellpadding="0" cellspacing="0" border="0"><tbody>
<thead>
<tr>
<th scope="col" colspan="3" class="pg2TableSectionTitle">Upgrade Warranty and Service Information:</th>
</tr>
</thead>
<tr>
<th scope="col">Start Date</th><th scope="col">End Date</th><th scope="col">SDF</th>
</tr>
<tr>
<td>2012-07-04</td><td>2015-07-03</td><td>SP4</td>
</tr>
<tr>
<td colspan="3">
SDF Description:
<span>This product has a three year limited warranty which includes a warranty upgrade. This product is entitled to parts and labor and includes on-site repair service. Service is available 7X24 with an 4 hour response objective.</span>
</td>
</tr>
</tbody></table></td>
</tr>
<tr>
<td><table class="ibm-data-table" cellpadding="0" cellspacing="0" border="0"><thead>
<tr>
<th scope="col" class="pg2MessageHead">Messages</th>
</tr>
</thead>
<tbody>
<tr>
<td class="pg2MessagePanel" align="left"> </td>
</tr>
</tbody></table></td>
</tr>
</tbody></table>
</div>
我的工作代码是:
public void actionPerformed(ActionEvent e) {
try {
String getTextArea;
getTextArea = textArea.getText();
String[] arr = getTextArea.split("\\n");
String type = null;
String serial = null;
int line = 0;
for(String s : arr) {
line++;
if(s.isEmpty()) {
textArea_1.append("Empty Line" + '\n');
continue;
}
type = s.substring(0, 4);
serial = s.substring(5, 12);
String html = "bla bla bla + type + serial;
Document doc = Jsoup.connect(html).get();
Elements tableElements = doc.select("table");
java.util.Iterator<Element> ite = tableElements.select("tr").iterator();
Elements tableElement = doc.select("tr");
java.util.Iterator<Element> ite1 = tableElement.select("table").iterator();
ite.next();
ite1.next();
String result,result1,result2;
result = ite.next().text();
result1 = ite1.next().text();
Scanner sr = new Scanner(result);
Scanner sr1 = new Scanner(result1);
// System.out.println(result);
// System.out.println(result1);
// result of first table
while(sr.hasNext()) {
result = result;
ite.next().text();
String lineOfType;
lineOfType = ite.next().text();
type = lineOfType.substring(6, 10);
String model;
model = lineOfType.substring(18, 21);
serial = lineOfType.substring(30, 37);
ite.next().text();
String country = ite.next().text();
country = country.substring(24, 31);
textArea_1.append(line + "-" + type + '\t' + model + '\t' + serial + " " + country + " ");
}
sr.close();
// result of secind table
while(sr1.hasNext()) {
result1 = result1;
String startDate = result1.substring(58, 68);
String endDate = result1.substring(69, 79);
textArea_1.append(startDate + " " + endDate + " ");
break;
}
sr1.close();
// getting the elements for the 3rd table, but not working as expected, it gets the secnd table data.
Elements tableElement2 = doc.select("tr");
java.util.Iterator<Element> ite2 = tableElement2.select("table").iterator();
ite2.next();
result2 = ite2.next().text();
Scanner sr2 = new Scanner(result2);
// this while shows the same result as the second while !
while(sr2.hasNext()) {
sr2.next();
result2 = result2;
System.out.println(result2);
String srvPkStart = result2.substring(58, 68);
if(srvPkStart.equals(result1.substring(58, 68))) {
srvPkStart = "Not found";
}
String srvPkEnd = result2.substring(69, 79);
if(srvPkEnd.equals(result1.substring(69, 79))) {
srvPkEnd = "";
}
System.out.println(srvPkStart + '\t' + srvPkEnd);
textArea_1.append("ServicePack Dates: " + srvPkStart + '\t' + srvPkEnd + '\n');
break;
}
} // end of for loop
} catch (Exception e2) {
// TODO: handle exception
}
}
});
答案 0 :(得分:1)
让&#39;说改变另一种更容易获得这些表的方法。我建议使用org.jsoup.nodes.Element.select()
逐个获取表格。
结帐link了解如何使用jsoup-selector-syntax
获取元素。
String html = "<body><div id=\"ibm-pcon\"><div id=\"ibm-content\"><div id=\"ibm-leadspace-head\" class=\"ibm-alternate\"><div id=\"ibm-leadspace-body\"><br></br><script type=\"text/javascript\">currentDate();</script><br></br><!--BEGIN OPTIONAL BREADCRUMBING--> <span style=\"font-size: small;\"><a href=\"/pc/entitle/pg2/Service.wss/display/MachineHome\">Machine Lookup</a> > <a href=\"/pc/entitle/pg2/Service.wss/mts/Lookup\">Warranty Information</a> > </span><!--END OPTIONAL BREADCRUMBING--><br></br><h1>PEW | Warranty Information</h1> </div></div><!-- CONTENT_BODY --><div id=\"ibm-content-body\"><div id=\"ibm-content-main\"><table class=\"ibm-results-table\" summary=\"output table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"><tbody xmlns=\"www.w3.org/TR/xhtml1/\"><thead> <tr><th scope=\"col\" class=\"pg2OutputTableSectionTitle\">Results of Machine Type/Serial Number Query</th> </tr></thead><tr> <td><table class=\"ibm-data-table ibm-alternating\" summary=\"output table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"> <tbody> <thead><tr> <th scope=\"col\" colspan=\"3\" class=\"pg2TableSectionTitle\">General Machine Information:</th></tr> </thead> <tr><td> Type: <span>1746</span></td><td> Model: <span>C4A</span></td><td> Serial: <span>13D06MK</span></td> </tr> <tr><td> Status: <span>Proof Of Purchase Rcvd</span></td><td> Build Date: <span> </span></td><td> Build to Model: <span> </span></td> </tr> <tr><td> Geography: <span>EMEA</span></td><td> Country: <span>GREECE</span></td><td> Configuration Id: <span> </span></td> </tr> <tr><td> OES Order Number: <span>2076804957</span></td><td> Customer Number: <span>108401</span></td><td> Delivery Number: <span>8519501492</span></td> </tr> <tr><td colspan=\"2\"> Service Status: <span>This machine is currently out of warranty.</span></td><td colspan=\"1\"> UAR End Date: <span>2012-08-02</span></td> </tr> </tbody></table> </td></tr><tr> <td><table class=\"ibm-data-table ibm-alternating\" summary=\"output table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"> <tbody> <thead><tr> <th scope=\"col\" colspan=\"3\" class=\"pg2TableSectionTitle\">Warranty and Service Information:</th></tr> </thead> <tr><th scope=\"col\">Start Date</th><th scope=\"col\">End Date</th><th scope=\"col\">SDF</th> </tr> <tr><td>2012-07-04</td><td>2015-07-03</td><td>3XL</td> </tr> <tr><td colspan=\"3\"> SDF Description: <span>This product has a 3 year limited warranty and is entitled to CRU (customer replaceable unit) and On-site service. Tier 1 CRUs are customer responsibility, see announcement for details. On-site Service is available Monday - Friday, except holidays, with a next business day response objective.</span></td> </tr> </tbody></table> </td></tr><tr> <td><table class=\"ibm-data-table ibm-alternating\" summary=\"output table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"> <tbody> <thead><tr> <th scope=\"col\" colspan=\"3\" class=\"pg2TableSectionTitle\">Upgrade Warranty and Service Information:</th></tr> </thead> <tr><th scope=\"col\">Start Date</th><th scope=\"col\">End Date</th><th scope=\"col\">SDF</th> </tr> <tr><td>2012-07-04</td><td>2015-07-03</td><td>SP4</td> </tr> <tr><td colspan=\"3\"> SDF Description: <span>This product has a three year limited warranty which includes a warranty upgrade. This product is entitled to parts and labor and includes on-site repair service.Service is available 7X24 with an 4 hour response objective.</span></td> </tr> </tbody></table> </td></tr><tr> <td><table class=\"ibm-data-table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"> <thead><tr> <th scope=\"col\" class=\"pg2MessageHead\">Messages</th></tr> </thead> <tbody><tr> <td class=\"pg2MessagePanel\" align=\"left\"> </td></tr> </tbody></table> </td></tr></tbody> </table></div> </body>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
Elements tables = doc.select("table.ibm-data-table.ibm-alternating"); // Get table which has classes = ibm-data-table, ibm-alternating
System.out.println(tables.size()); // tables.size = 3
for (Element ele: tables) {
// Get table header
Elements thElements = ele.select("tr > th.pg2TableSectionTitle"); // Get tableheader has classes = pg2TableSectionTitle
if (thElements != null && thElements.size() > 0) {
String tableTitle = thElements.get(0).text();
System.out.println(tableTitle);
if (tableTitle.contains("General Machine Information:")) {
// Apply your logic accordingly for table #General Machine
}
else if (tableTitle.contains("Warranty and Service Information:")) {
// Apply your logic accordingly for table #Warranty and Service
}
else if (tableTitle.contains("Upgrade Warranty and Service Information:")) {
// Apply your logic accordingly for table #Upgrade Warranty
}
}
}