所以我想从这个 site 中抓取一个 Confirmed Participate 的列表,但它返回一个空列表/字符串
这是我的代码:
driver.find_element_by_xpath('//*[@id="main"]/div/section/section[8]/section/div[1]/div[1]/a').text
请问我该如何解决?
答案 0 :(得分:0)
这里的学分: How to get text of an element in Selenium WebDriver, without including child element text?
这里是延迟加载:How to get all the data from a webpage manipulating lazy-loading method?
你的 xpath 没有指向正确的 html 标签
from selenium import webdriver
import time
driver = webdriver.Chrome(executable_path='D:/chromedriver.exe')
driver.get('https://www.griclub.org/event/real-estate/gri-proptech-esummit_2191.html')
def get_text_excluding_children(driver, element):
return driver.execute_script(""" return jQuery(arguments[0]).contents().filter(function() {
return this.nodeType == Node.TEXT_NODE; }).text();
""", element)
def scroll_at_the_end_of_the_page_lazy_load():
check_height = driver.execute_script("return document.body.scrollHeight;")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
height = driver.execute_script("return document.body.scrollHeight;")
if height == check_height:
break
check_height = height
#Scroll untill all participants are on the page
scroll_at_the_end_of_the_page_lazy_load()
# get all CONFIRMED PARTICIPANTS
all_confirmed_participants = driver.find_elements_by_xpath("//section[@data-title ='CONFIRMED PARTICIPANTS']/descendant::div[(@class='crm-people-round-plus-text')]")
time.sleep(5)
for element in all_confirmed_participants:
#print (element.get_attribute("innerHtml"))
print( get_text_excluding_children(driver, element))