Question

我正在尝试使用以下代码

来抓取https://www.tiaa.org/public/offer/products/life-insurance的评论内容

from selenium import webdriver
driver = webdriver.Chrome()
driver.get(html_page)
driver.find_element_by_xpath("""//*[@id="bv-hero-ALL-LIFE-INSURANCE"]/span[2]/span[2]""").click()
driver.implicitly_wait(20)
reviews_list = driver.find_elements_by_css_selector('bv-content-item bv-content-top-review bv-content-review')
author = ''
summary = ''
product_family = ''
gender = ''
occupation = ''
reason = ''

driver.switch_to_frame(0)
for div in driver.find_elements_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol'):
    author = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[2]/div[1]/div/div[1]/div/div[1]/div/div/div/h3')
    print(author.text)
    summary = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[2]/div[1]/div/div[2]/div/div/div[1]/p')
    print (summary.text)
    product_family = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[2]/div[1]/div/div[2]/div/div/div[2]/div[3]/div/span/a')
    print(product_family.text)
    gender = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[1]/div/dl/dd[3]/ul/li[5]/span[2]')
    print(gender.text)
    occupation = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[1]/div/dl/dd[3]/ul/li[4]/span[2]')
    print(occupation.text)
    reason = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[1]/div/dl/dd[3]/ul/li[2]/span[2]')
    print(reason.text)

我试过.getText（）也..但没有运气..请指点......

Answer 1

原因getText()＆amp; text无效是因为您尝试访问的元素是隐藏的（我猜是CSS），而getText()只提取可见的innerText。 DOM上有两个元素（对于每个用户），其中包含有关作者姓名的信息。在这两个中，一个是隐藏的（你正在访问）＆amp;一个是可见的（你应该使用它）。

应该有效的作者姓名的xpath是：

//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[1]/div/div/div/div/h3

div.find_element_by_xpath.text值不会到来

1 个答案: