Question

我正在尝试从https://store.steampowered.com/newshub/app/1145360抓取所有更新说明。我用类“ eventcalendar_CalendarRow_398u2”标识了更新说明，并编写了如下代码：

updatenotes = soup.find_all("div", attrs={"class":"eventcalendar_CalendarRow_398u2"})
for updatenote in updatenotes:

但是当我尝试抓取时，它不会返回任何结果，我认为这是由于网站的动态性质所致。在开始抓取之前，我正在使用Selenium完全向下滚动，但是它不起作用。有人能帮忙吗？

Answer 1

尝试以下

driver.get('https://store.steampowered.com/newshub/app/1145360')
scroll_pause_time = 1
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
    updatenotes=driver.find_elements_by_css_selector("div.eventcalendar_CalendarRow_398u2")
    print(len(updatenotes))
    for updatenote in updatenotes:
        print(updatenote.text)
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(scroll_pause_time)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        # If heights are the same it will exit the function
        break
    last_height = new_height

抓取工具未从动态网页返回结果

1 个答案: