Question

所以我想从这个 site 中抓取一个 Confirmed Participate 的列表，但它返回一个空列表/字符串

这是我的代码：

driver.find_element_by_xpath('//*[@id="main"]/div/section/section[8]/section/div[1]/div[1]/a').text

请问我该如何解决？

Answer 1

这里的学分： How to get text of an element in Selenium WebDriver, without including child element text?

这里是延迟加载：How to get all the data from a webpage manipulating lazy-loading method?

你的 xpath 没有指向正确的 html 标签

from selenium import webdriver
import time

driver = webdriver.Chrome(executable_path='D:/chromedriver.exe')


driver.get('https://www.griclub.org/event/real-estate/gri-proptech-esummit_2191.html')


def get_text_excluding_children(driver, element): 
    return driver.execute_script(""" return jQuery(arguments[0]).contents().filter(function() { 
    return this.nodeType == Node.TEXT_NODE; }).text();
    """, element)

def scroll_at_the_end_of_the_page_lazy_load():
    check_height = driver.execute_script("return document.body.scrollHeight;") 
    while True:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(2)
        height = driver.execute_script("return document.body.scrollHeight;") 
        if height == check_height: 
            break 
        check_height = height

#Scroll untill all participants are on the page
scroll_at_the_end_of_the_page_lazy_load()

# get all CONFIRMED PARTICIPANTS
all_confirmed_participants = driver.find_elements_by_xpath("//section[@data-title ='CONFIRMED PARTICIPANTS']/descendant::div[(@class='crm-people-round-plus-text')]")
time.sleep(5)
for element in all_confirmed_participants:
    #print (element.get_attribute("innerHtml"))
    print( get_text_excluding_children(driver, element))

有没有办法用硒来抓取这个网页？

1 个答案: