如何使用Selenium获取网站的每个网址?

时间:2017-07-13 07:31:50

标签: java selenium-webdriver

我目前正在自动化URL不断变化的网站(像网站一样的SSO)..我们在querystring中传递参数..我想捕获网站经历的每个URL以达到特定的页。如何使用Selenium Webdriver实现这一目标..

我定期尝试了driver.getCurrentUrl(),但它不可靠..

还有其他解决办法吗?

非常感谢!

1 个答案:

答案 0 :(得分:0)

尝试运行以下内容:

    driver.get("http://www.telegraph.co.uk/");
    List<WebElement> links = driver.findElements(By.tagName("a"));
    List<String> externalUrls = new ArrayList();
    List<String> internalUrls = new ArrayList();

    System.out.println(links.size());

    for (int i = 1; i <= links.size(); i = i + 1) {
            String url = links.get(i).getAttribute("href");
            System.out.println("Name:"+links.get(i).getText());
            System.out.println("url"+url);
            System.out.println("----");
            if (url.startsWith("http://www.telegraph.co.uk/")) {
                if(!internalUrls.contains(url))
                    internalUrls.add(links.get(i).getAttribute("href"));
            } else {
                if(!externalUrls.contains(url))
                    externalUrls.add(links.get(i).getAttribute("href"));
            }
        }

如果您想收集网站的所有链接,那么我会做类似的事情:

public class GetAllLinksFromThePage {

    static List<String> externalUrls = new ArrayList();
    static List<String> internalUrls = new ArrayList();
    public static void main(String[] args) {
        MyChromeDriver myChromeDriver = new MyChromeDriver();
        WebDriver driver = myChromeDriver.initChromeDriver();

        checkForLinks(driver, "http://www.telegraph.co.uk/");

        System.out.println("finish");

    }

    public static void checkForLinks(WebDriver driver, String page) {
        driver.get(page);
        System.out.println("PAGE->" + page);
        List<WebElement> links = driver.findElements(By.tagName("a"));
        for (WebElement we : links) {
            String url = we.getAttribute("href");
            if (url.startsWith("http://www.telegraph.co.uk/")) { //mymainpage
                if (!internalUrls.contains(url)) {
                    internalUrls.add(we.getAttribute("href"));
                    System.out.println(we.getText() + " has added to internalUrls");
                    checkForLinks(driver, url);
                }

            } else if (!externalUrls.contains(url)) {
                externalUrls.add(we.getAttribute("href"));
                System.out.println(we.getText() + " has added to externalUrls");
            }
        }

    }
}

希望有所帮助!