以HTML格式从第三个表中获取数据

时间:2016-06-10 02:09:12

标签: java html-table jsoup element getelementsbytagname

我正在研究一种工具而且我已经完成了最后一步,但是我遇到了一个小问题,请您能给我一个提示。 我有这3个表,我只能从前2个获得数据,如何才能到达第三个表格,升级保修和服务信息?

这是表格代码:

<body>
		<div id="ibm-pcon">
			<div id="ibm-content">
				<div id="ibm-leadspace-head" class="ibm-alternate">
					<div id="ibm-leadspace-body">
						<br></br>
						<script type="text/javascript">currentDate();</script>
						<br></br>
						
							<!--BEGIN OPTIONAL BREADCRUMBING--> <span style="font-size: small;"><a href="/pc/entitle/pg2/Service.wss/display/MachineHome">Machine Lookup</a> &gt; <a href="/pc/entitle/pg2/Service.wss/mts/Lookup">Warranty Information</a> &gt; </span>
							<!--END OPTIONAL BREADCRUMBING--> 
						
						<br></br>
						<h1>PEW | Warranty Information</h1>				
					</div>
				</div>
				<!-- CONTENT_BODY -->
				<div id="ibm-content-body">
					<div id="ibm-content-main">
					<!-- LEADSPACE_BEGIN -->				
								
						
		<!-- This section can be used to test JavaScript and CSS before promoting the data to the template XML. -->
		<table class="ibm-results-table" summary="output table" cellpadding="0" cellspacing="0" border="0"><tbody xmlns="http://www.w3.org/TR/xhtml1/">
<thead>
<tr>
<th scope="col" class="pg2OutputTableSectionTitle">Results of Machine Type/Serial Number Query</th>
</tr>
</thead>
<tr>
<td><table class="ibm-data-table ibm-alternating" summary="output table" cellpadding="0" cellspacing="0" border="0"><tbody>
<thead>
<tr>
<th scope="col" colspan="3" class="pg2TableSectionTitle">General Machine Information:</th>
</tr>
</thead>
<tr>
<td>
                    Type:
                    <span>1746</span>
</td><td>
                    Model:
                    <span>C4A</span>
</td><td>
                    Serial:
                    <span>13D06MK</span>
</td>
</tr>
<tr>
<td>
                    Status:
                    <span>Proof Of Purchase Rcvd</span>
</td><td>
                        Build Date:
                        <span>&nbsp;</span>
</td><td>
                        Build to Model:
                        <span> </span>
</td>
</tr>
<tr>
<td>
                        Geography:
                        <span>EMEA</span>
</td><td>
                        Country:
                        <span>GREECE</span>
</td><td>
                        Configuration Id:
                        <span>&nbsp;</span>
</td>
</tr>
<tr>
<td>
                        OES Order Number:
                        <span>2076804957</span>
</td><td>
                        Customer Number:
                        <span>108401</span>
</td><td>
                        Delivery Number:
                        <span>8519501492</span>
</td>
</tr>
<tr>
<td colspan="2">
                                    Service Status:
                                    <span>This machine is currently out of warranty.</span>
</td><td colspan="1">
                                    UAR End Date:
                                    <span>2012-08-02</span>
</td>
</tr>
</tbody></table></td>
</tr>
<tr>
<td><table class="ibm-data-table ibm-alternating" summary="output table" cellpadding="0" cellspacing="0" border="0"><tbody>
<thead>
<tr>
<th scope="col" colspan="3" class="pg2TableSectionTitle">Warranty and Service Information:</th>
</tr>
</thead>
<tr>
<th scope="col">Start Date</th><th scope="col">End Date</th><th scope="col">SDF</th>
</tr>
<tr>
<td>2012-07-04</td><td>2015-07-03</td><td>3XL</td>
</tr>
<tr>
<td colspan="3">
                    SDF Description:
                    <span>This product has a 3 year limited warranty and is entitled to CRU (customer replaceable unit) and On-site service. Tier 1 CRUs are customer responsibility, see announcement for details. On-site Service is available Monday - Friday, except holidays, with a next business day response objective.</span>
</td>
</tr>
</tbody></table></td>
</tr>
<tr>
<td><table class="ibm-data-table ibm-alternating" summary="output table" cellpadding="0" cellspacing="0" border="0"><tbody>
<thead>
<tr>
<th scope="col" colspan="3" class="pg2TableSectionTitle">Upgrade Warranty and Service Information:</th>
</tr>
</thead>
<tr>
<th scope="col">Start Date</th><th scope="col">End Date</th><th scope="col">SDF</th>
</tr>
<tr>
<td>2012-07-04</td><td>2015-07-03</td><td>SP4</td>
</tr>
<tr>
<td colspan="3">
                    SDF Description:
                    <span>This product has a three year limited warranty which includes a warranty upgrade. This product is entitled to parts and labor and includes on-site repair service.  Service is available 7X24 with an 4 hour response objective.</span>
</td>
</tr>
</tbody></table></td>
</tr>
<tr>
<td><table class="ibm-data-table" cellpadding="0" cellspacing="0" border="0"><thead>
<tr>
<th scope="col" class="pg2MessageHead">Messages</th>
</tr>
</thead>
<tbody>
<tr>
<td class="pg2MessagePanel" align="left">&nbsp;</td>
</tr>
</tbody></table></td>
</tr>
</tbody></table>
		
					</div>

我的工作代码是:

            public void actionPerformed(ActionEvent e) {                
                try {
                    String getTextArea;
                    getTextArea = textArea.getText();
                    String[] arr = getTextArea.split("\\n");
                    String type = null;
                    String serial = null;
                    int line = 0;
                    for(String s : arr) {

                        line++;
                        if(s.isEmpty()) {
                            textArea_1.append("Empty Line" + '\n');
                            continue;
                        }

                        type = s.substring(0, 4);
                        serial = s.substring(5, 12);
                        String html = "bla bla bla + type + serial;

                         Document doc = Jsoup.connect(html).get();
                         Elements tableElements = doc.select("table");
                         java.util.Iterator<Element> ite = tableElements.select("tr").iterator();
                         Elements tableElement = doc.select("tr");
                         java.util.Iterator<Element> ite1 = tableElement.select("table").iterator();
                         ite.next();
                         ite1.next();

                         String result,result1,result2;
                         result = ite.next().text();
                         result1 = ite1.next().text();

                         Scanner sr = new Scanner(result);
                         Scanner sr1 = new Scanner(result1);

//                       System.out.println(result);
//                       System.out.println(result1);

                         // result of first table
                         while(sr.hasNext()) {
                             result = result;
                             ite.next().text();
                             String lineOfType;
                             lineOfType = ite.next().text();
                             type = lineOfType.substring(6, 10);
                             String model;
                             model = lineOfType.substring(18, 21);
                             serial = lineOfType.substring(30, 37);
                             ite.next().text();
                             String country = ite.next().text();
                             country = country.substring(24, 31);
                             textArea_1.append(line + "-" + type + '\t' + model + '\t' + serial + "    " + country + "    ");
                         }

                         sr.close();

                      // result of secind table

                         while(sr1.hasNext()) {
                             result1 = result1;
                             String startDate = result1.substring(58, 68);
                             String endDate = result1.substring(69, 79);
                             textArea_1.append(startDate + "    " + endDate + "    ");
                             break;
                         }

                         sr1.close();

                      // getting the elements for the 3rd table, but not working as expected, it gets the secnd table data.

                         Elements tableElement2 = doc.select("tr");
                         java.util.Iterator<Element> ite2 = tableElement2.select("table").iterator();
                         ite2.next();
                         result2 = ite2.next().text();
                         Scanner sr2 = new Scanner(result2);


                      // this while shows the same result as the second while !
                         while(sr2.hasNext()) {
                             sr2.next();
                             result2 = result2;
                             System.out.println(result2);
                             String srvPkStart = result2.substring(58, 68);
                             if(srvPkStart.equals(result1.substring(58, 68))) {
                                 srvPkStart = "Not found";
                             }
                             String srvPkEnd = result2.substring(69, 79);
                             if(srvPkEnd.equals(result1.substring(69, 79))) {
                                 srvPkEnd = "";
                             }
                             System.out.println(srvPkStart + '\t' + srvPkEnd);
                             textArea_1.append("ServicePack Dates: " + srvPkStart + '\t' + srvPkEnd + '\n');
                             break;
                         }



                    } // end of for loop    
                } catch (Exception e2) {
                    // TODO: handle exception
                }
            } 
        });

1 个答案:

答案 0 :(得分:1)

让&#39;说改变另一种更容易获得这些表的方法。我建议使用org.jsoup.nodes.Element.select()逐个获取表格。

结帐link了解如何使用jsoup-selector-syntax获取元素。

    String html = "<body><div id=\"ibm-pcon\"><div id=\"ibm-content\"><div id=\"ibm-leadspace-head\" class=\"ibm-alternate\"><div id=\"ibm-leadspace-body\"><br></br><script type=\"text/javascript\">currentDate();</script><br></br><!--BEGIN OPTIONAL BREADCRUMBING--> <span style=\"font-size: small;\"><a href=\"/pc/entitle/pg2/Service.wss/display/MachineHome\">Machine Lookup</a> &gt; <a href=\"/pc/entitle/pg2/Service.wss/mts/Lookup\">Warranty Information</a> &gt; </span><!--END OPTIONAL BREADCRUMBING--><br></br><h1>PEW | Warranty Information</h1> </div></div><!-- CONTENT_BODY --><div id=\"ibm-content-body\"><div id=\"ibm-content-main\"><table class=\"ibm-results-table\" summary=\"output table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"><tbody xmlns=\"www.w3.org/TR/xhtml1/\"><thead> <tr><th scope=\"col\" class=\"pg2OutputTableSectionTitle\">Results of Machine Type/Serial Number Query</th> </tr></thead><tr> <td><table class=\"ibm-data-table ibm-alternating\" summary=\"output table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"> <tbody> <thead><tr> <th scope=\"col\" colspan=\"3\" class=\"pg2TableSectionTitle\">General Machine Information:</th></tr> </thead> <tr><td> Type: <span>1746</span></td><td> Model: <span>C4A</span></td><td> Serial: <span>13D06MK</span></td> </tr> <tr><td> Status: <span>Proof Of Purchase Rcvd</span></td><td> Build Date: <span>&nbsp;</span></td><td> Build to Model: <span> </span></td> </tr> <tr><td> Geography: <span>EMEA</span></td><td> Country: <span>GREECE</span></td><td> Configuration Id: <span>&nbsp;</span></td> </tr> <tr><td> OES Order Number: <span>2076804957</span></td><td> Customer Number: <span>108401</span></td><td> Delivery Number: <span>8519501492</span></td> </tr> <tr><td colspan=\"2\"> Service Status: <span>This machine is currently out of warranty.</span></td><td colspan=\"1\"> UAR End Date: <span>2012-08-02</span></td> </tr> </tbody></table> </td></tr><tr> <td><table class=\"ibm-data-table ibm-alternating\" summary=\"output table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"> <tbody> <thead><tr> <th scope=\"col\" colspan=\"3\" class=\"pg2TableSectionTitle\">Warranty and Service Information:</th></tr> </thead> <tr><th scope=\"col\">Start Date</th><th scope=\"col\">End Date</th><th scope=\"col\">SDF</th> </tr> <tr><td>2012-07-04</td><td>2015-07-03</td><td>3XL</td> </tr> <tr><td colspan=\"3\"> SDF Description: <span>This product has a 3 year limited warranty and is entitled to CRU (customer replaceable unit) and On-site service. Tier 1 CRUs are customer responsibility, see announcement for details. On-site Service is available Monday - Friday, except holidays, with a next business day response objective.</span></td> </tr> </tbody></table> </td></tr><tr> <td><table class=\"ibm-data-table ibm-alternating\" summary=\"output table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"> <tbody> <thead><tr> <th scope=\"col\" colspan=\"3\" class=\"pg2TableSectionTitle\">Upgrade Warranty and Service Information:</th></tr> </thead> <tr><th scope=\"col\">Start Date</th><th scope=\"col\">End Date</th><th scope=\"col\">SDF</th> </tr> <tr><td>2012-07-04</td><td>2015-07-03</td><td>SP4</td> </tr> <tr><td colspan=\"3\"> SDF Description: <span>This product has a three year limited warranty which includes a warranty upgrade. This product is entitled to parts and labor and includes on-site repair service.Service is available 7X24 with an 4 hour response objective.</span></td> </tr> </tbody></table> </td></tr><tr> <td><table class=\"ibm-data-table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"> <thead><tr> <th scope=\"col\" class=\"pg2MessageHead\">Messages</th></tr> </thead> <tbody><tr> <td class=\"pg2MessagePanel\" align=\"left\">&nbsp;</td></tr> </tbody></table> </td></tr></tbody> </table></div> </body>";
    Document doc = Jsoup.parse(html, "", Parser.xmlParser());
    Elements tables = doc.select("table.ibm-data-table.ibm-alternating"); // Get table which has classes = ibm-data-table, ibm-alternating

    System.out.println(tables.size()); // tables.size = 3

    for (Element ele: tables) {
        // Get table header
        Elements thElements = ele.select("tr > th.pg2TableSectionTitle"); // Get tableheader has classes = pg2TableSectionTitle

        if (thElements != null && thElements.size() > 0) {
            String tableTitle = thElements.get(0).text();
            System.out.println(tableTitle);

            if (tableTitle.contains("General Machine Information:")) {
                // Apply your logic accordingly for table #General Machine
            }
            else if (tableTitle.contains("Warranty and Service Information:")) {
                // Apply your logic accordingly for table #Warranty and Service
            }
            else if (tableTitle.contains("Upgrade Warranty and Service Information:")) {
                // Apply your logic accordingly for table #Upgrade Warranty
            }
        }
    }