我正在尝试使用以下代码
来抓取https://www.tiaa.org/public/offer/products/life-insurance的评论内容from selenium import webdriver
driver = webdriver.Chrome()
driver.get(html_page)
driver.find_element_by_xpath("""//*[@id="bv-hero-ALL-LIFE-INSURANCE"]/span[2]/span[2]""").click()
driver.implicitly_wait(20)
reviews_list = driver.find_elements_by_css_selector('bv-content-item bv-content-top-review bv-content-review')
author = ''
summary = ''
product_family = ''
gender = ''
occupation = ''
reason = ''
driver.switch_to_frame(0)
for div in driver.find_elements_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol'):
author = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[2]/div[1]/div/div[1]/div/div[1]/div/div/div/h3')
print(author.text)
summary = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[2]/div[1]/div/div[2]/div/div/div[1]/p')
print (summary.text)
product_family = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[2]/div[1]/div/div[2]/div/div/div[2]/div[3]/div/span/a')
print(product_family.text)
gender = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[1]/div/dl/dd[3]/ul/li[5]/span[2]')
print(gender.text)
occupation = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[1]/div/dl/dd[3]/ul/li[4]/span[2]')
print(occupation.text)
reason = div.find_element_by_xpath('//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[1]/div/dl/dd[3]/ul/li[2]/span[2]')
print(reason.text)
我试过.getText()也..但没有运气..请指点......
答案 0 :(得分:2)
原因getText()
& text
无效是因为您尝试访问的元素是隐藏的(我猜是CSS),而getText()
只提取可见的innerText。 DOM上有两个元素(对于每个用户),其中包含有关作者姓名的信息。在这两个中,一个是隐藏的(你正在访问)&一个是可见的(你应该使用它)。
应该有效的作者姓名的xpath是:
//*[@id="BVRRContainer"]/div/div/div/div/ol/li[1]/div[1]/div/div/div/div/h3