在Java中使用硒刮表

时间:2019-02-19 06:26:03

标签: java selenium web-scraping

我正在从具有以下格式的交易表中废弃帐户的交易:- Table format

如果我知道行数,则可以遍历它,并通过对每个字段使用单独的定位符来获取所需的数据

我如何刮擦整个桌子,因为我不知道会有多少笔交易,我需要一些东西可以通过它遍历并刮擦交易。我正在使用硒在Java中进行报废。

这是交易表的HTML:-

<div id="txn-display"> 
<!-- Transactions start  -->
<!--#include virtual="mobile-statement.shtml" -->
    <table id="txn-display-table">
        <thead>
            <tr>
                <th>Date</th>
                <th colspan="2">Description</th>
                <th>Type</th>
                <th class="amount-cell">Amount Spent  (<em class="WebRupee">Rs.</em>)</th>                        
            </tr>
        </thead>
        <tbody>
            <tr class="gridEven">
                <td>12/02/2019</td>
                <td colspan="2" class="word-break">INTERGLOBE AVIATION LT .             IND</td>
                <td class="txn-type">Debit</td>
                <td class="amount-cell">320</td>                        
            </tr>
            <tr class="gridOdd">
                <td>27/01/2019</td>
                <td colspan="2" class="word-break">PETROL TRXN FEE RVRSL EXCLUDING TAX</td>
                <td class="txn-type">Credit</td>
                <td class="amount-cell">8.21</td>                       
            </tr>
            <tr class="gridEven">
                <td>27/01/2019</td>
                <td colspan="2" class="word-break">SHELL R K R ENTERPRISE BANGALORE     IND</td>
                <td class="txn-type">Debit</td>
                <td class="amount-cell">831.06</td>                     
            </tr>
        </tbody>
    </table>        
</div>

2 个答案:

答案 0 :(得分:0)

您尚未发布HTML,因此以我自己的示例为例,在我要使用行和列计数进行迭代的情况下,请检查并让我们知道是否有任何疑问。.

package Testng_Pack;

import java.util.concurrent.TimeUnit;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.testng.annotations.AfterTest;
import org.testng.annotations.BeforeTest;
import org.testng.annotations.Test;

public class table { 

 WebDriver driver = null;
 @BeforeTest
    public void setup() throws Exception { 
         System.setProperty("webdriver.gecko.driver", "D:\\Selenium Files\\geckodriver.exe");
  driver = new FirefoxDriver();
         driver.manage().window().maximize();
         driver.manage().timeouts().implicitlyWait(15, TimeUnit.SECONDS);
         driver.get("Pass the URL here"); 
    } 

  @AfterTest
 public void tearDown() throws Exception { 
   driver.quit();
     } 

 @Test
 public void print_data(){

 //Get number of rows In table.
 int Row_count = driver.findElements(By.xpath("//*[@id='post-body-6522850981930750493']/div[1]/table/tbody/tr")).size();
 System.out.println("Number Of Rows = "+Row_count);

 //Get number of columns In table.
 int Col_count = driver.findElements(By.xpath("//*[@id='post-body-6522850981930750493']/div[1]/table/tbody/tr[1]/td")).size();
 System.out.println("Number Of Columns = "+Col_count);

 //divided xpath In three parts to pass Row_count and Col_count values.
 String first_part = "//*[@id='post-body-6522850981930750493']/div[1]/table/tbody/tr[";
 String second_part = "]/td[";
 String third_part = "]";

 //Used for loop for number of rows.
 for (int i=1; i<=Row_count; i++){
  //Used for loop for number of columns.
  for(int j=1; j<=Col_count; j++){
   //Prepared final xpath of specific cell as per values of i and j.
   String final_xpath = first_part+i+second_part+j+third_part;
   //Will retrieve value from located cell and print It.
   String Table_data = driver.findElement(By.xpath(final_xpath)).getText();
   System.out.print(Table_data +"  ");   
  }
   System.out.println("");
   System.out.println("");  
 } 
 }
}

答案 1 :(得分:0)

下面提到的代码将自动计算表中提到的行和列。 它适用于具有trtd标记名的表。您只需要将Web表xpath传递给代码即可。

@Test 
public void testWebTable()  { 
WebElement simpleTable = driver.findElement(By.xpath("//table[@id='txn-display-table']//tbody")); 

    // Get all rows 
    List<WebElement> rows = simpleTable.findElements(By.tagName("tr")); 
    Assert.assertEquals(rows.size(),4); 

    // Print data from each row 
    for (WebElement row : rows) { 
        List<WebElement> cols = row.findElements(By.tagName("td")); 
        for (WebElement col : cols) {
             System.out.print(col.getText() + "\t"); 
           } System.out.println(); 
       }
    }