使用Selenium获取NPR头条新闻

时间:2017-11-24 16:40:00

标签: python selenium selenium-webdriver webdriver

我正在尝试从https://www.npr.org/sections/thetwo-way/archive获取打印NPR标题,但我的代码无效。我使用的是Python3和Selenium ChromeDriver。这就是我现在所拥有的:

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains


custom_path = "/Users/ashkij/Desktop/"
driver = webdriver.Chrome("/Users/ashkij/Desktop/chromedriver")


#Open a page that has a list of NPR headlines.
driver.get("https://www.npr.org/sections/thetwo-way/archive")
#After the first few articles, one has to scroll to the bottom of the page 

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

for i in range(1,10):
    #Get each article from an XPATH expression
    article_headline = driver.find_element_by_xpath("""//*[@id="infinitescroll"]/article[{}]/div[2]/h2/a""".format(i))
    print(article_headline.text)

在第一篇文章中,我收到此错误:

selenium.common.exceptions.NoSuchElementException: Message: no such element: 
Unable to locate element: {"method":"xpath","selector":"//*
[@id="infinitescroll"]/article[1]/div[2]/h2/a"}

但是,我确认以上是给定文章的XPath表达式,所以我不知道为什么Selenium说XPath表达式无效。

1 个答案:

答案 0 :(得分:0)

尝试以下代码:

headlines = driver.find_elements_by_css_selector(".title>a")

for headline in headlines:

    print(headline.text)

希望它可以帮到你!