使用Selenium进行Web Scraping:代码随机抛出StaleElementReferenceException

时间:2017-01-02 21:27:44

标签: selenium selenium-webdriver web-scraping selenium-firefoxdriver

我试图抓取AliExpress的某些项目,但是当代码到达其中一个项目时(完全不确定),parseItems方法中的urlelement会随机变为陈旧,并且该方法会抛出异常。

代码:

package com.ardilgulez.seleniumweb;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;

import java.util.List;
import java.util.concurrent.TimeUnit;

public class App {

    private static WebDriver firefoxDriver = new FirefoxDriver();

    public static boolean parseItems throws StaleElementReferenceException (List<WebElement> items){
        System.out.println(items.size());
        if(items.size() > 0){
            items.forEach((item) -> {
                WebElement urlelement = item.findElement(By.cssSelector(".detail>h3>a"));
                String href = urlelement.getAttribute("href");
                System.out.println(href);
                String title = urlelement.getAttribute("title");
                System.out.println(title);
            });
        }
        return true;
    }

    public static void main(String[] args) {
        firefoxDriver.get("https://www.aliexpress.com/");
        firefoxDriver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);

        WebElement questionElement = firefoxDriver.findElement(By.xpath("//input[@name='SearchText']"));
        questionElement.sendKeys("ESP8266");
        questionElement.submit();

        while (true) {
            try {
                (new WebDriverWait(firefoxDriver, 10))
                    .until((WebDriver webDriver) -> ((JavascriptExecutor) webDriver).executeScript("return document.readyState").equals("complete"));

                (new WebDriverWait(firefoxDriver, 10))
                    .until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//ul[@id='hs-list-items']")));

                (new WebDriverWait(firefoxDriver, 10))
                    .until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//div[@id='hs-below-list-items']")));

                System.out.println("WAIT1");

                (new WebDriverWait(firefoxDriver, 20))
                        .until((WebDriver webDriver) -> {
                            WebElement listItemsUL = (new WebDriverWait(webDriver, 10))
                                .until(ExpectedConditions.presenceOfElementLocated(By.xpath("//ul[@id='hs-list-items']")));

                            List<WebElement> items = listItemsUL.findElements(By.tagName("li"));
                            return parseItems(items);
                        });

                (new WebDriverWait(firefoxDriver, 20))
                        .until((WebDriver webDriver) -> {
                            WebElement belowListItemsDiv = (new WebDriverWait(webDriver, 10))
                                .until(ExpectedConditions.presenceOfElementLocated(By.xpath("//div[@id='hs-below-list-items']")));

                            WebElement belowListItemsUL = belowListItemsDiv.findElement(By.tagName("ul"));
                            List<WebElement> items = belowListItemsUL.findElements(By.tagName("li"));
                            return parseItems(items);
                        });

                System.out.println("WAIT2");

                WebElement nextElement = (new WebDriverWait(firefoxDriver, 10))
                    .until(ExpectedConditions.presenceOfElementLocated(By.xpath("//a[@class='page-next ui-pagination-next']")));

                System.out.println(nextElement.toString());
                System.out.println("CLICK CLICK");
                nextElement.click();

            } catch (Exception e) {
                e.printStackTrace();
                break;
            }
        }
    }
}

有时元素甚至在代码获得其href之后但在代码获得其标题之前抛出异常。

我不知道我的代码发生了什么。它实际上工作正常,直到它随机决定不工作,我不知道为什么。

1 个答案:

答案 0 :(得分:1)

看起来当您分页时,您没有等待下一页准备好,列表可能包含上一页的元素。

要确保上一页不再可用,请尝试等到列表中的某个元素在单击分页按钮后变为陈旧,如下所示:

nextElement.click();
new WebDriverWait(firefoxDriver, 20)).until ExpectedConditions.stalenessOf(someElementFromTheList));