我正在尝试使用selenium抓取并解析网站的动态内容, 通常,抓取的网站通过页面中的滚动事件加载其内容,因此我通过selenium触发滚动事件,直到到达页面末尾。
在产品阶段我通过循环迭代获取每个产品细节,它也可以正常工作......但是当它达到280以上的迭代次数时....
这是我的代码......
private void init() throws IOException {
FirefoxProfile profile = new FirefoxProfile();//Create Firefox profile
profile.setPreference("javascript.enabled", true);//Allow javascript for browser
WebDriver htmDriver = new FirefoxDriver(profile);//add profile to firefoxDriver
htmDriver.get(urlTextField.getText());//Get and Connect to the url from URL text Field
htmDriver.manage().window().maximize();//Maximize the Browser window
String count = htmDriver.findElement(By.cssSelector("#numbFound > #no-of-results-filter")).getText();//Total Product Count for the category
//System.out.println("Total Category Count : "+count);
htmDriver.findElement(By.cssSelector(".list")).click();//Click to view the Product in List
htmDriver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);//Wait
int lCount = Integer.parseInt(count);//Calculate the scroll length
for (int i = 1; i <= Math.ceil(lCount / 5); i++) {
//Generate Arrow Down Action
htmDriver.findElement(By.id("products-main4")).sendKeys(
Keys.ARROW_DOWN);
htmDriver.findElement(By.id("products-main4")).sendKeys(
Keys.ARROW_DOWN);
htmDriver.findElement(By.id("products-main4")).sendKeys(
Keys.ARROW_DOWN);
htmDriver.findElement(By.id("products-main4")).sendKeys(
((JavascriptExecutor) htmDriver).executeScript(
"window.scrollBy(0,document.body.scrollHeight)", "");
htmDriver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);
}
//Product Phase
int row2 = 0;
List<WebElement> rdata = htmDriver.findElements(By.className("product_list_view_cont"));//selector to select each product row
for (WebElement data : rdata) {
String title = data.findElement(By.cssSelector(".product_list_view_heading")).getText();//Get Product Title
System.out.println(title);
//Check if Product Price available
boolean product_price = data.findElements(By.cssSelector(".product_list_view_price_outer span")).isEmpty();
if(product_price == false){
//Get the Price of the Product
String price = data.findElement(By.cssSelector(".product_list_view_price_outer var[id^=selling-price-id-]")).getText().trim();
System.out.println(price);
}else{
//If Price not Available add make the data null
system.out.println("No price")
}
String brand = data.findElement(By.cssSelector("ul.key-features li")).getText();
System.out.println(brand);
String brandUrl = data.findElement(By.cssSelector(".product_list_view_info_cont a")).getAttribute("href");//Fetch Brand Url
System.out.println(brandUrl);
String status = data.findElement(By.cssSelector(".product_list_view_buy-outer .lfloat")).getText();//Fetch Brand Url
System.out.println(status);
}
}
selenium抛出异常如下
Feb 18, 2015 10:00:10 AM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.BindException) caught when processing request to {}->http://localhost:7055: Address already in use: connect
Feb 18, 2015 10:00:10 AM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:7055
Feb 18, 2015 10:00:10 AM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.BindException) caught when processing request to {}->http://localhost:7055: Address already in use: connect
Feb 18, 2015 10:00:10 AM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:7055
Feb 18, 2015 10:00:10 AM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.BindException) caught when processing request to {}->http://localhost:7055: Address already in use: connect
Feb 18, 2015 10:00:10 AM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:7055
Feb 18, 2015 10:00:12 AM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.BindException) caught when processing request to {}->http://localhost:7055: Address already in use: connect
Feb 18, 2015 10:00:12 AM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:7055
Feb 18, 2015 10:00:12 AM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.BindException) caught when processing request to {}->http://localhost:7055: Address already in use: connect
Feb 18, 2015 10:00:12 AM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:7055
Feb 18, 2015 10:00:12 AM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.BindException) caught when processing request to {}->http://localhost:7055: Address already in use: connect
Feb 18, 2015 10:00:12 AM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:7055
对于每次迭代后的某些时间......