我已经阅读了所有这些SO帖子并阅读了Selenium文档,我尝试了“ expected_conditions”,但是没有任何效果...
这就是我要做的-我正在构建一个抓取工具,并决定在亚马逊产品详细信息页面上对其进行测试-该页面上有一个div标签,其ID为:books-entity-teaser,该标签由当该标签在页面上可见时的JS代码...
但是,当我执行代码时,标记完全为空
有人可以指出我所缺少的吗
我尝试等待代码加载,然后获取页面源代码
WebDriverWait(self.browser, 10).until(expected_conditions.invisibility_of_element((By.ID, 'books-entity-teaser')))
这是我的python代码
def scrape(self, url: str = 'https://www.amazon.com/dp/1408865270'):
self.browser.get(url)
page_state = self.browser.execute_script('return document.readyState;')
scroll_pause_time = 0.5
# Get scroll height
last_height = self.browser.execute_script('return document.body.scrollHeight')
while True:
# Scroll down to bottom
self.browser.execute_script('window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;')
# Wait to load page
time.sleep(scroll_pause_time)
# Calculate new scroll height and compare with last scroll height
new_height = self.browser.execute_script('return document.body.scrollHeight')
if new_height == last_height:
break
last_height = new_height
page_source = self.browser.execute_script('return document.body.innerHTML')
WebDriverWait(self.browser, 10).until(expected_conditions.presence_of_element_located((By.ID, 'books-entity-teaser')))
return page_source