div.find_element_by_xpath.text值不会到来

时间:2017-05-02 03:26:55

标签: python selenium web screen-scraping

我正在尝试使用以下代码

来抓取https://www.tiaa.org/public/offer/products/life-insurance的评论内容
from selenium import webdriver
driver = webdriver.Chrome()
driver.get(html_page)
driver.find_element_by_xpath("""//*[@id="bv-hero-ALL-LIFE-INSURANCE"]/span[2]/span[2]""").click()
driver.implicitly_wait(20)
reviews_list = driver.find_elements_by_css_selector('bv-content-item bv-content-top-review bv-content-review')
author = ''
summary = ''
product_family = ''
gender = ''
occupation = ''
reason = ''

driver.switch_to_frame(0)
for div in driver.find_elements_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol'):
    author = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[2]/div[1]/div/div[1]/div/div[1]/div/div/div/h3')
    print(author.text)
    summary = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[2]/div[1]/div/div[2]/div/div/div[1]/p')
    print (summary.text)
    product_family = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[2]/div[1]/div/div[2]/div/div/div[2]/div[3]/div/span/a')
    print(product_family.text)
    gender = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[1]/div/dl/dd[3]/ul/li[5]/span[2]')
    print(gender.text)
    occupation = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[1]/div/dl/dd[3]/ul/li[4]/span[2]')
    print(occupation.text)
    reason = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[1]/div/dl/dd[3]/ul/li[2]/span[2]')
    print(reason.text)

我试过.getText()也..但没有运气..请指点......

1 个答案:

答案 0 :(得分:2)

原因getText()& text无效是因为您尝试访问的元素是隐藏的(我猜是CSS),而getText()只提取可见的innerText。 DOM上有两个元素(对于每个用户),其中包含有关作者姓名的信息。在这两个中,一个是隐藏的(你正在访问)&一个是可见的(你应该使用它)。

应该有效的作者姓名的xpath是:

//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[1]/div/div/div/div/h3