Question

我正在使用Python / Selenium抓取Google搜索页面，从昨晚开始，我一直遇到<div class="row"> <div class="col-md-3"> <div class="container-fluid"> <div class="thumbnail" style="width:200px"> <div class="panel panel-primary"> <div class="panel-heading"> <h3 class="panel-title">Candidate's Details</h3> </div> </div> <p> <h4 style="font-weight:bold">Application No:</h4> </p> <p> @Model.Username </p> <hr /> <p> <h4 style="font-weight:bold">Name:</h4> </p> <p> @Model.FullName </p> <hr /> <p> <h4 style="font-weight:bold">School:</h4> </p> <p> @Model.School </p> <hr /> <p> <h4 style="font-weight:bold">Course:</h4> </p> <p> @Model.Course </p> </div> //Controller public ActionResult Contact(string id) { using (ORMOnlineExamEntitiesApp db = new ORMOnlineExamEntitiesApp()) { var user = db.Candidates.SingleOrDefault(u => u.Username == id); return View(user); } } Please find the login page below: //Login POST [HttpPost] [ValidateAntiForgeryToken] public ActionResult Login(CandidateLogin login, string ReturnUrl = "") { string message = ""; using (ORMOnlineExamEntitiesApp dc = new ORMOnlineExamEntitiesApp()) { var v = dc.Candidates.Where(a => a.Username == login.Username).FirstOrDefault(); if (v != null) { if (string.Compare(Crypto.Hash(login.Password), v.Password) == 0) { int timeout = login.RememberMe ? 525600 : 20; // 525600 min = 1 year var ticket = new FormsAuthenticationTicket(login.Username, login.RememberMe, timeout); string encrypted = FormsAuthentication.Encrypt(ticket); var cookie = new HttpCookie(FormsAuthentication.FormsCookieName, encrypted); cookie.Expires = DateTime.Now.AddMinutes(timeout); cookie.HttpOnly = true; Response.Cookies.Add(cookie); if (Url.IsLocalUrl(ReturnUrl)) { return Redirect(ReturnUrl); } else { FormsAuthentication.SetAuthCookie(login.Username, true); Session["UserId"] = login.Username.ToString(); return RedirectToAction("Contact", "Home"); } } else { message = "Invalid credential provided"; } } else { message = "Invalid credential provided"; } } ViewBag.Message = message; return View(); }错误。我调试了代码，发现错误始于此代码块，此处为

MaxRetyError: p[Errno 61] Connection refused

我不断收到以下错误：

domain = pattern.search(website)
counter = 2

# keep running this until the url appears like normal
while domain is None:
    counter += 1
    # close chrome and try again
    print('link not found, closing chrome and restarting ...\nwaiting {} seconds...'.format(counter))
    chrome.quit()
    time.sleep(counter)
    # chrome = webdriver.Chrome()
    time.sleep(10)                              ### tried inserting a timer.sleep to delay request
    chrome.get('https://google.com')            ### error is right here. This is the second instance of chrome.get in this script
    target = chrome.find_element_by_name('q')
    target.send_keys(college)
    target.send_keys(Keys.RETURN)

    # parse the webpage
    soup = BeautifulSoup(chrome.page_source, 'html.parser')

    website = soup.find('cite', attrs={'class': 'iUh30'}).text
    print('tried to get URL, is this it? : {}\n'.format(website))
    pattern = re.compile(r'\w+\.(edu|com)')
    domain = pattern.search(website)

正如您在上面的代码块中看到的那样，我输入了raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='ADDRESS', port=PORT): Max retries exceeded with url: /session/92ca3da95353ca5972fb5c520b704be4/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x11100e4e0>: Failed to establish a new connection: [Errno 61] Connection refused',))，但似乎根本没有帮助。对于上下文，此脚本是函数的一部分，在另一个脚本中，该函数在循环中被重复调用。但是，再次确保在每次timer.sleep()方法调用之间添加延迟。到目前为止，我的脚本在此循环的第一次迭代中失败。

我尝试使用Google搜索该问题，但发现最接近的问题是this。它似乎在讲相同的确切错误，并且最上面的答案标识了导致问题的相同方法，但是我真的不明白“解决方案”和“结论”部分在说什么。我发现webdriver.get()令人费解，但解决方案到底是什么？

它提到了MaxRetryError参数和Tracebacks，但是我不知道它们在这种情况下的含义。有什么办法可以解决此错误（在硒的情况下）？我在Stack Exchange上有一些线程提到了捕获错误，但是仅在max_retries的上下文中。就我而言，我需要为Selenium软件包捕获相同的错误。

谢谢您的建议

Answer 1

我的代码仍然偶尔偶尔遇到问题（可以通过使用代理来解决），但是我认为我找到了问题的根源。该循环预期第一个模式匹配将返回.edu或.com，但不会预期.org。因此，当第一个搜索结果返回.org时，我的代码将无限期运行。这是问题的根源：

website = soup.find('cite', attrs={'class': 'iUh30'}).text
print('tried to get URL, is this it? : {}\n'.format(website))
pattern = re.compile(r'\w+\.(edu|com)') # does not anticipate .org's

现在我的代码可以正常运行了，尽管当代码运行太长时间时我确实遇到了错误（在这种情况下，问题的根源更加清晰了）。

Answer 2

您退出Chrome驱动程序为时尚早。调用chrome.quit()后，它将导致后续对chrome.get('https://google.com')的调用失败，然后自动重试会导致MaxRetryError。

尝试删除对chrome.quit()的呼叫。

硒连接被拒绝

2 个答案: