我正在尝试使用Selenium导航网站,但是当我尝试获取下一页时,出现错误:访问被拒绝。您是否没有访问“ http://blah.com/”的权限?在此服务器上。
我的代码如下:
import os
import time
from selenium import webdriver
os.environ['MOZ_HEADLESS'] = '1'
petshop_url = 'https://www.blah.com/Filtro=D37608&ordenacao=_maisvendidos&nid=202059'
browser = webdriver.Firefox(executable_path = './geckodriver')
browser.get(petshop_url)
next_button = browser.find_element_by_id('ctl00_Conteudo_ctl02_divBuscaResultadoInferior').find_element_by_class_name('next')
time.sleep(1)
next_button.click()
time.sleep(1)
html_source = browser.page_source
print(html_source)
我已经尝试按照这里的建议清理现金并删除代理:Selenium Problem: Access Denied You don't have permission to access "site" on this server
还添加和删除了一个睡眠选项,在Chrome中尝试了该选项,并删除了无头选项,但没有任何效果。知道我的错误是什么吗?
这是浏览器关闭时的日志:
1572780171083 Marionette TRACE [16] Received DOM event pageshow for https://www.blah.com/?Filtro=D37608&Ordenacao=_maisvendidos&paginaAtual=3&ComparacaoProdutos=&AdicionaListaCasamento=
1572780171086 Marionette DEBUG 0 <- [1,6,null,{"value":null}]
1572780171093 webdriver::server DEBUG <- 200 OK {"value":null}
1572780172095 webdriver::server DEBUG -> GET /session/1ea63780-133a-4649-ba1b-5732a2fed59c/source
1572780172098 Marionette DEBUG 0 -> [0,7,"WebDriver:GetPageSource",{}]
1572780172099 Marionette DEBUG 0 <- [1,7,null,{"value":"<html><head>\n<title>Access Denied</title>\n</head><body>\n<h1>Access Denied</h1>\n \nYou don't have perm ... ccess \"http://www.blah.com/?\” on this server.<p>\nReference #18.debc1002.1572780170.31119482\n\n\n</p></body></html>"}]
1572780172102 webdriver::server DEBUG <- 200 OK {"value":"<html><head>\n<title>Access Denied</title>\n</head><body>\n<h1>Access Denied</h1>\n \nYou don't have permission to access \"http://www.blah.com/?\” on this server.<p>\nReference #18.debc1002.1572780170.31119482\n\n\n</p></body></html>"}
1572780172103 webdriver::server DEBUG -> DELETE /session/1ea63780-133a-4649-ba1b-5732a2fed59c
1572780172106 Marionette DEBUG 0 -> [0,8,"Marionette:Quit",{"flags":["eForceQuit"]}]
1572780172106 Marionette INFO Stopped listening on port 56193
1572780172149 Marionette TRACE Received observer notification quit-application
1572780172164 Marionette DEBUG 0 <- [1,8,null,{"cause":"shutdown"}]
1572780172202 webdriver::server DEBUG Deleting session
1572780172221 Marionette DEBUG 0 -> [0,9,"Marionette:Quit",{"flags":["eForceQuit"]}]
1572780172222 Marionette DEBUG 0 <- [1,9,{"error":"invalid session id","message":"Tried to run command without establishing a connection","stacktrace":"WebDriver ... t@chrome://marionette/content/server.js:249:9\n_onJSONObjectReady/<@chrome://marionette/content/transport.js:501:20\n"},null]
1572780172222 Marionette DEBUG Closed connection 0
1572780176394 Marionette TRACE Received observer notification xpcom-will-shutdown
这是我要单击的元素的HTML。这是无序列表中的一项:
<a href="https://www.blah.com/?Filtro=D37608&ordenacao=_maisvendidos&nid=202059&paginaAtual=2" onclick="javascript:MontaUrlLista("/site/PaginaBuscaNew.aspx?Filtro=D37608&Ordenacao=_maisvendidos&paginaAtual=2","ctl00_Conteudo_ctl02_hdnComparacaoProdutos","ctl00_Conteudo_ctl02_hdnListaCasamento"); return false">Próxima</a>
答案 0 :(得分:0)
如果可以的话,您可以共享URL吗,以便我检查。
大多数情况下,我猜该网站要么需要在主页上生成的cookie /会话。
以下是提示:
添加引荐来源网址 (具体取决于网站)
desired_capabilities = DesiredCapabilities.CHROME.copy()
desired_capabilities['chrome.page.customHeaders.referrer'] = 'xxxxx.com' (the previous site url)
尝试此代码:
def get_dynamic_website_content(url, first_refer='https://www.google.com'):
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
chrome_prefs = {}
options.experimental_options["prefs"] = chrome_prefs
chrome_prefs['chrome.page.customHeaders.referrer'] = first_refer
wd = webdriver.Chrome(chrome_options=options) # live
wd = webdriver.Chrome(executable_path="/chromedriver.exe", chrome_options=options) # desktop env
wd.get(url)
elems = wd.find_elements_by_xpath("//a[@href]")
for elem in elems:
link = elem.get_attribute("href")
get_dynamic_website_content(link, url) // load recursively by adding refer
# Add custom return logic etc
还请对主页进行网络检查,然后检查其如何加载下一页。
检查并实验会话,cookie,自定义标题和其他内容,以确保您将它们添加/删除到铬中。
答案 1 :(得分:0)
如果您需要测试不包含Cookie的网站。
browser.get(petshop_url)
browser.delete_all_cookies()