如何进入本网站的下一个网页?

时间:2017-10-26 20:22:40

标签: python python-2.7 selenium web-scraping

我正在尝试访问JavaScript动态网站的下一页。 http://msds.walmartstores.com/我一直试图激活以进入下一页的元素是:

<a href="#" class="next" data-action="next">›</a> 

在本网站上: http://msds.walmartstores.com/

可在此内找到:

<div class="pagination" id="pagination" style="">
    <a href="#" class="first" data-action="first">«</a>
    <a href="#" class="previous" data-action="previous">‹</a>
    <input type="text">
    <a href="#" class="next" data-action="next">›</a>
    <a href="#" class="last" data-action="last">»</a>
</div>

我能够获取我想在此页面上搜索的所有JavaScript元素(PDF)。我如何进入下一页?

我遇到的问题/尝试:

1

代码:

driver.find_element_by_class_name("next").click()

错误:

Traceback (most recent call last):
  File "<stdin>", line 56, in <module>
  File "<stdin>", line 50, in main
  File "E:\Python27\lib\site-packages\selenium\webdriver\remote\webelement.py",
line 77, in click
self._execute(Command.CLICK_ELEMENT)
  File "E:\Python27\lib\site-packages\selenium\webdriver\remote\webelement.py",
line 494, in _execute
return self._parent.execute(command, params)
  File "E:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", l
ine 236, in execute
self.error_handler.check_response(response)
  File "E:\Python27\lib\site-packages\selenium\webdriver\remote\errorhandler.py"
, line 192, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Element i
s not clickable at point (251, 2173)
  (Session info: chrome=61.0.3163.100)
  (Driver info: chromedriver=2.27.440174 (e97a722caafc2d3a8b807ee115bfb307f7d2cf
d9),platform=Windows NT 6.1.7601 SP1 x86_64)

shell returned 1
Hit any key to close this window...

2

代码:

driver.find_element_by_class_name("next").submit()

错误:

Traceback (most recent call last):
  File "<stdin>", line 56, in <module>
  File "<stdin>", line 50, in main
  File "E:\Python27\lib\site-packages\selenium\webdriver\remote\webelement.py",
line 88, in submit
    self._execute(Command.SUBMIT_ELEMENT)
  File "E:\Python27\lib\site-packages\selenium\webdriver\remote\webelement.py",
line 494, in _execute return self._parent.execute(command, params)
  File "E:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", 
line 236, in execute self.error_handler.check_response(response)
  File "E:\Python27\lib\sitepackages\selenium\webdriver\remote\errorhandler.py", line 192, 
in check_response raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: 
Element was not in a form, so could not submit.
  (Session info: chrome=61.0.3163.100)
  (Driver info: chromedriver=2.27.440174 (e97a722caafc2d3a8b807ee115bfb307f7d2cfd9),
platform=Windows NT 6.1.7601 SP1 x86_64)

shell returned 1
Hit any key to close this window...

以下是我打开网页的代码:

import time
from selenium import webdriver
import os

url = "http://msds.walmartstores.com/"
    myfile = open("PDFLinks.txt", "w")
    driver = webdriver.Chrome()
    driver.get(url)

感谢您的帮助

3 个答案:

答案 0 :(得分:0)

尝试使用xpath:

el= driver.find_element_by_xpath("//div[@class='pagination' and @id='pagination']")
el.find_element_by_xpath(".//a[@class='next']").click()

修改

我试过这个html(基于你的):

<!DOCTYPE html>
<html>
<body>
<script>
function myFunction() {
    alert("Hello! I am an alert box!");
}
</script>

<div class="pagination" id="pagination" style="">
    <a href="#" class="first" data-action="first">«</a>
    <a href="#" class="previous" data-action="previous">‹</a>
    <input type="text">
    <a href="#" class="next" data-action="next" onclick="myFunction()">›</a>
    <a href="#" class="last" data-action="last">»</a>
</div>

</body>
</html>

它有效。可能你需要做更多关于你的HTML的信息。

<强> EDIT2

我看了你感兴趣的网页。从您的代码中不清楚您想要做什么,并且在您的问题中没有解释要执行的步骤。

我注意到,如果我点击&#34;提交&#34;按钮,将出现.pdf列表。在这种情况下,有一个你感兴趣的元素。 所以,我试过这个:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

driver.get("http://msds.walmartstores.com/")
driver.find_element_by_id("submitUPC").click()
wait = WebDriverWait(driver, 20)
el= wait.until(EC.presence_of_element_located((By.XPATH,"//div[@class='pagination' and @id='pagination']")))
el.find_element_by_xpath(".//a[@class='next']").click()

答案 1 :(得分:0)

我使用C#,但这应该很接近。对于难以触及的元素,我对CSS选择器有更多的好运。 我没有看到iframe或任何奇怪的东西,所以这应该有效。这使用CSS Selector和JavaScript的组合来定位并单击您的下一个按钮。您可能需要对C#进行一些调整以适应Python,但它应该可以解决您的问题。

By next = By.CssSelector("#pagination > a.next");
        IWebElement nextButton = driver.FindElement(next);

IJavaScriptExecutor clickNextButton = driver as IJavaScriptExecutor;
        clickNextButton.ExecuteScript("arguments[0].click();", nextButton);

答案 2 :(得分:0)

解决方案我提出了:

由于我尝试的所有内容都给了我错误,并且尝试点击该元素会给我一个未知的错误:元素在点(251,2173)处无法点击&#34;我想出了一个解决方法。

在网站底部有一个地方可以为页面插入一个数字,我用它来移动到下一页并在我的程序中有一个计数器。

import time
from selenium import webdriver
import os
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys


while is_next_there(driver):

    go_through_page(driver, myfile, page)

    #Gets the next page, and waits for load
    el = wait.until(EC.presence_of_element_located((By.XPATH,"//div[@class='pagination' and @id='pagination']")))
    el.find_element_by_xpath(".//input").clear()
    el.find_element_by_xpath(".//input").send_keys(page)
    el.find_element_by_xpath(".//input").send_keys(Keys.ENTER)
    page += 1

通过这个解决方案,我能够成功进入下一页