这运行和刮擦链接到我想要的方式除了python不能识别“scraped_pages”的值当我在终端中运行它时,每个循环都会增加1个循环,但它只是在整数更高时继续比“page_nums”。当我将“page_nums”设置为低于5的整数时,它将运行并停在5但是它会再次崩溃。如果我没有在这个问题上表达最好的话,我会道歉。 上面的所有代码都在工作,这是问题代码。所有模块也都正确导入。 它使用selenium,我不确定显式等待是否有效,因为它会在达到“page_nums”值之前崩溃。
page_nums = raw_input("how many pages to scrape?: ")
urls_list = []
scraped_pages = 0
scraped_links = 0
while scraped_pages <= page_nums:
for li in list_items:
for a in li.find_all('a', href=True):
url = a['href']
if slicer(url,'http'):
url1 = slicer(url,'http')
urls_list.append(url1)
scraped_links += 1
elif slicer(url,'www'):
url1 = slicer(url,'www')
urls_list.append(url1)
scraped_links += 1
else:
pass
scraped_pages += 1
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "/html/body/div[5]/div[4]/div[9]/div[1]/div[3]/div/div[5]/div/span[1]/div/table/tbody/tr/td[12]")))
driver.find_element_by_xpath("/html/body/div[5]/div[4]/div[9]/div[1]/div[3]/div/div[5]/div/span[1]/div/table/tbody/tr/td[12]").click()
print scraped_links
print urls_list
以下是返回错误的一部分。
1
2
Traceback (most recent call last):
File "google page click 2.py", line 51, in <module>
driver.find_element_by_xpath("/html/body/div[5]/div[4]/div[9]/div[1]/div[3]/div/div[5]/div/span[1]/div/table/tbody/tr/td[12]").click()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 75, in click
self._execute(Command.CLICK_ELEMENT)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 454, in _execute
return self._parent.execute(command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 201, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 181, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotVisibleException: Message: Element is not currently visible and so may not be interacted with
Stacktrace:
at fxdriver.preconditions.visible (file:///tmp/tmpzSHEeb/extensions/fxdriver@googlecode.com/components/command-processor.js:9981)
at DelayedCommand.prototype.checkPreconditions_ (file:///tmp/tmpzSHEeb/extensions/fxdriver@googlecode.com/components/command-processor.js:12517)
at DelayedCommand.prototype.executeInternal_/h (file:///tmp/tmpzSHEeb/extensions/fxdriver@googlecode.com/components/command-processor.js:12534)
at DelayedCommand.prototype.executeInternal_ (file:///tmp/tmpzSHEeb/extensions/fxdriver@googlecode.com/components/command-processor.js:12539)
at DelayedCommand.prototype.execute/< (file:///tmp/tmpzSHEeb/extensions/fxdriver@googlecode.com/components/command-processor.js:12481)
答案 0 :(得分:0)
此问题是因为Selenium已成功加载网页,但您尝试点击的按钮是“不在视图中”。有两种可能的情况:
1。如果由于某种原因,该元素被隐藏(即通过CSS隐藏),那么您将无法单击它。
2. 元素可能不可见,因为它位于您必须滚动查看的网页的一部分。要绕过此问题,您可以使用:
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "/html/body/div[5]/div[4]/div[9]/div[1]/div[3]/div/div[5]/div/span[1]/div/table/tbody/tr/td[12]")))
element_to_click = driver.find_element_by_xpath("/html/body/div[5]/div[4]/div[9]/div[1]/div[3]/div/div[5]/div/span[1]/div/table/tbody/tr/td[12]")
driver.execute_script('return arguments[0].scrollIntoView();', element_to_click)
element_to_click.click()
第三行执行一些JavaScript,在您要求Selenium单击它之前将该元素滚动到视图中。如果您在数百/数千个网页上使用此脚本,那么您可能希望在time.sleep(1)
之前使用element_to_click.click()
,从而使浏览器有时间将元素滚动到视图中,然后再尝试单击它。
顺便说一下,你应该对XPath做一点阅读 - 尝试按id
或class
名称选择元素要容易得多:
//div[@id="some-id"]
使用id="some-id"
选择文档中的元素。