我正在尝试从https://store.steampowered.com/newshub/app/1145360抓取所有更新说明。我用类“ eventcalendar_CalendarRow_398u2”标识了更新说明,并编写了如下代码:
updatenotes = soup.find_all("div", attrs={"class":"eventcalendar_CalendarRow_398u2"})
for updatenote in updatenotes:
但是当我尝试抓取时,它不会返回任何结果,我认为这是由于网站的动态性质所致。在开始抓取之前,我正在使用Selenium完全向下滚动,但是它不起作用。有人能帮忙吗?
答案 0 :(得分:0)
尝试以下
driver.get('https://store.steampowered.com/newshub/app/1145360')
scroll_pause_time = 1
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
updatenotes=driver.find_elements_by_css_selector("div.eventcalendar_CalendarRow_398u2")
print(len(updatenotes))
for updatenote in updatenotes:
print(updatenote.text)
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(scroll_pause_time)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
# If heights are the same it will exit the function
break
last_height = new_height