Question

我正试图从这个网站获取塞浦路斯所有住宿的链接： http://www.zoover.nl/cyprus

到目前为止，我可以检索已经显示的前15个。所以现在我必须调用“volgende”链接上的点击。但是我不知道怎么做，在源代码中我无法追踪调用的函数，例如......发布在这里： Issues with invoking "on click event" on the html page using beautiful soup in Python

我只需要“点击”发生的步骤，这样我就可以获取接下来的15个链接，等等。

有人知道如何帮忙吗？谢谢！

编辑：

我的代码现在看起来像这样：

def getZooverLinks(country):
    zooverWeb = "http://www.zoover.nl/"
    url = zooverWeb + country
    parsedZooverWeb = parseURL(url)
    driver = webdriver.Firefox()
    driver.get(url)

    button = driver.find_element_by_class_name("next")
    links = []
    for page in xrange(1,3):
        for item in parsedZooverWeb.find_all(attrs={'class': 'blue2'}):
            for link in item.find_all('a'):
                newLink = zooverWeb + link.get('href')
                links.append(newLink)
        button.click()'

我收到以下错误：

selenium.common.exceptions.StaleElementReferenceException：消息：元素不再附加到DOM 堆栈跟踪： at fxdriver.cache.getElementAt（resource：//fxdriver/modules/web-element-cache.js：8956）在Utils.getElementAt（file：///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js：8546） at fxdriver.preconditions.visible（file：///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:9585）在DelayedCommand.prototype.checkPreconditions_（file：///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12257）在DelayedCommand.prototype.executeInternal_ / h（file：///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12274）在DelayedCommand.prototype.executeInternal_（file：///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12279）在DelayedCommand.prototype.execute /＆lt; （文件：///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js：12221）

我很困惑：/

Answer 1

尽管使用Beautifulsoup的evaluateJavaScript方法尝试这样做可能很诱人，但最终Beautifulsoup是parser而不是交互式网页浏览客户端。

你应该认真考虑用硒来解决这个问题，如this answer中简要介绍的那样。硒有很好的Python bindings。

您可以使用selenium查找元素并单击它，然后将页面传递给Beautifulsoup，并使用现有代码获取链接。

或者，您可以使用onclick处理程序中列出的Javascript。我从源代码中提取了这个：EntityQuery('Ns=pPopularityScore%7c1&No=30&props=15292&dims=530&As=&N=0+3+10500915');。 No参数每页增加15，但props让我猜测。不过，我建议不要进入这个网站，只是使用selenium作为客户端与网站进行互动。对于他们这方面的变化，这也更加强大。

Answer 2

我尝试了以下代码，并能够加载下一页。希望这也会对你有所帮助。代码：

from selenium import webdriver
import os
chromedriver = "C:\Users\pappuj\Downloads\chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
url='http://www.zoover.nl/cyprus'
driver.get(url)
driver.find_element_by_class_name('next').click()

由于

使用beautifulsoup python调用onclick事件

2 个答案: