无法使用Selenium在Google新闻页面中获取新闻文章链接

时间:2018-05-22 10:46:19

标签: java selenium selenium-webdriver

以下脚本在搜索文本"利物浦"之后进入新闻页面。然后将所有链接打印到文件,并在控制台中打印它们。 这里的问题是我无法获得谷歌新闻页面中所有新闻文章的链接。除了打印页面的所有其他链接

public static void main(String[] args) throws IOException {

        String URL = "https://www.google.com";
        WebDriver driver = new FirefoxDriver();
        driver.manage().window().maximize();
        driver.get(URL);
        WebElement searchBar = driver.findElement(By.xpath("//*[@id='lst-ib']"));
        searchBar.sendKeys("Liverpool");
        WebElement clickSearch = driver.findElement(By.name("btnK"));
        clickSearch.click();
        WebElement newsButton = driver.findElement(By.xpath("//*[@id='hdtb-msb-vis']/div[2]/a"));
        newsButton.click();
        java.util.List<WebElement> links = driver.findElements(By.tagName("a"));
        System.out.println(links.size());
        FileWriter file = new FileWriter("/Users/lekharaj/Desktop/LFC.txt");
        BufferedWriter b = new BufferedWriter(file);
        for(int i=0;i<links.size();i++){
            String text = links.get(i).getAttribute("href");
            System.out.println("\n"+text);
            b.write(text);
            b.newLine();
            b.flush();
            }
    }

2 个答案:

答案 0 :(得分:1)

要在 Google主页上搜索利物浦文字,然后打印所有链接,您可以使用以下解决方案:

  • 代码块:

    System.setProperty("webdriver.gecko.driver", "C:\\Utility\\BrowserDrivers\\geckodriver.exe");
    WebDriver driver = new FirefoxDriver();
    driver.navigate().to("https://www.google.com/");
    WebElement submit_button = driver.findElement(By.name("q"));
    submit_button.sendKeys("Liverpool");
    submit_button.submit();
    new WebDriverWait(driver, 20).until(ExpectedConditions.elementToBeClickable(By.linkText("News"))).click();
    List <WebElement> my_list = new WebDriverWait(driver, 20).until(ExpectedConditions.visibilityOfAllElementsLocatedBy(By.cssSelector("h3.r.dO0Ag>a")));
    System.out.println("The list of href links are : ");
    for(WebElement element:my_list)
        System.out.println(element.getAttribute("href"));
    
  • 控制台输出:

    The list of href links are : 
    http://www.espn.com/soccer/club/liverpool/364/blog/post/3506156/jurgen-klopp-stability-at-liverpool-the-envy-of-their-premier-league-rivals
    https://www.liverpoolfc.com/news/first-team/302415-liverpool-fc-songs-fans-europe
    http://www.espn.com/soccer/club/liverpool/364/blog/post/3489163/liverpools-andrew-robertson-hoping-to-mimic-predecessor-alan-kennedys-heroics-against-real-madrid
    https://www.liverpoolecho.co.uk/sport/football/transfer-news/christian-pulisic-refuses-comment-reports-14688647
    https://www.liverpoolecho.co.uk/sport/football/football-news/liverpool-legend-terry-mcdermotts-three-14688746
    http://www.skysports.com/football/news/11669/11381416/liverpool-transfer-rumours-gianluigi-donnarumma-daniel-ceballos-jamaal-lascelles-and-james-tarkowski
    http://www.skysports.com/more-sports/ufc/news/29876/11380754/ufc-how-liverpool-built-darren-till
    https://www.independent.co.uk/sport/football/transfers/liverpool-transfer-news-jurgen-klopp-shortlist-tarkowski-lascelles-premier-league-epl-a8361576.html
    https://www.belfasttelegraph.co.uk/sport/football/premier-league/liverpool/this-real-madrid-star-can-crush-liverpools-champions-league-dream-says-giggs-36932660.html
    http://kwese.espn.com/football/blog/transfer-talk/79/post/3506910/transfer-rater-neymar-to-real-madridjamaal-lascelles-to-liverpool
    

答案 1 :(得分:0)

您可以使用以下x路径仅获取文章链接。

//article/a

在代码中,

public static void main(String[] args) throws IOException {

    String URL = "https://www.google.com";
    WebDriver driver = new FirefoxDriver();
    driver.manage().window().maximize();
    driver.get(URL);
    WebElement searchBar = driver.findElement(By.xpath("//*[@id='lst-ib']"));
    searchBar.sendKeys("Liverpool");
    WebElement clickSearch = driver.findElement(By.name("btnK"));
    clickSearch.click();
    WebElement newsButton = driver.findElement(By.xpath("//*[@id='hdtb-msb-vis']/div[2]/a"));
    newsButton.click();
    java.util.List<WebElement> links = driver.findElements(By.xpath("//article/a"));
    System.out.println(links.size());
    FileWriter file = new FileWriter("/Users/lekharaj/Desktop/LFC.txt");
    BufferedWriter b = new BufferedWriter(file);
    for(int i=0;i<links.size();i++){
        String text = links.get(i).getAttribute("href");
        System.out.println("\n"+text);
        b.write(text);
        b.newLine();
        b.flush();
        }
}