我正在使用Python / Selenium抓取Google搜索页面,从昨晚开始,我一直遇到 <div class="row">
<div class="col-md-3">
<div class="container-fluid">
<div class="thumbnail" style="width:200px">
<div class="panel panel-primary">
<div class="panel-heading">
<h3 class="panel-title">Candidate's Details</h3>
</div>
</div>
<p>
<h4 style="font-weight:bold">Application No:</h4>
</p>
<p>
@Model.Username
</p>
<hr />
<p>
<h4 style="font-weight:bold">Name:</h4>
</p>
<p>
@Model.FullName
</p>
<hr />
<p>
<h4 style="font-weight:bold">School:</h4>
</p>
<p>
@Model.School
</p>
<hr />
<p>
<h4 style="font-weight:bold">Course:</h4>
</p>
<p>
@Model.Course
</p>
</div>
//Controller
public ActionResult Contact(string id)
{
using (ORMOnlineExamEntitiesApp db = new ORMOnlineExamEntitiesApp())
{
var user = db.Candidates.SingleOrDefault(u => u.Username == id);
return View(user);
}
}
Please find the login page below:
//Login POST
[HttpPost]
[ValidateAntiForgeryToken]
public ActionResult Login(CandidateLogin login, string ReturnUrl = "")
{
string message = "";
using (ORMOnlineExamEntitiesApp dc = new ORMOnlineExamEntitiesApp())
{
var v = dc.Candidates.Where(a => a.Username == login.Username).FirstOrDefault();
if (v != null)
{
if (string.Compare(Crypto.Hash(login.Password), v.Password) == 0)
{
int timeout = login.RememberMe ? 525600 : 20; // 525600 min = 1 year
var ticket = new FormsAuthenticationTicket(login.Username, login.RememberMe, timeout);
string encrypted = FormsAuthentication.Encrypt(ticket);
var cookie = new HttpCookie(FormsAuthentication.FormsCookieName, encrypted);
cookie.Expires = DateTime.Now.AddMinutes(timeout);
cookie.HttpOnly = true;
Response.Cookies.Add(cookie);
if (Url.IsLocalUrl(ReturnUrl))
{
return Redirect(ReturnUrl);
}
else
{
FormsAuthentication.SetAuthCookie(login.Username, true);
Session["UserId"] = login.Username.ToString();
return RedirectToAction("Contact", "Home");
}
}
else
{
message = "Invalid credential provided";
}
}
else
{
message = "Invalid credential provided";
}
}
ViewBag.Message = message;
return View();
}
错误。我调试了代码,发现错误始于此代码块,此处为
MaxRetyError: p[Errno 61] Connection refused
我不断收到以下错误:
domain = pattern.search(website)
counter = 2
# keep running this until the url appears like normal
while domain is None:
counter += 1
# close chrome and try again
print('link not found, closing chrome and restarting ...\nwaiting {} seconds...'.format(counter))
chrome.quit()
time.sleep(counter)
# chrome = webdriver.Chrome()
time.sleep(10) ### tried inserting a timer.sleep to delay request
chrome.get('https://google.com') ### error is right here. This is the second instance of chrome.get in this script
target = chrome.find_element_by_name('q')
target.send_keys(college)
target.send_keys(Keys.RETURN)
# parse the webpage
soup = BeautifulSoup(chrome.page_source, 'html.parser')
website = soup.find('cite', attrs={'class': 'iUh30'}).text
print('tried to get URL, is this it? : {}\n'.format(website))
pattern = re.compile(r'\w+\.(edu|com)')
domain = pattern.search(website)
正如您在上面的代码块中看到的那样,我输入了raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='ADDRESS', port=PORT): Max retries exceeded with url: /session/92ca3da95353ca5972fb5c520b704be4/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x11100e4e0>: Failed to establish a new connection: [Errno 61] Connection refused',))
,但似乎根本没有帮助。对于上下文,此脚本是函数的一部分,在另一个脚本中,该函数在循环中被重复调用。但是,再次确保在每次timer.sleep()
方法调用之间添加延迟。到目前为止,我的脚本在此循环的第一次迭代中失败。
我尝试使用Google搜索该问题,但发现最接近的问题是this。它似乎在讲相同的确切错误,并且最上面的答案标识了导致问题的相同方法,但是我真的不明白“解决方案”和“结论”部分在说什么。我发现webdriver.get()
令人费解,但解决方案到底是什么?
它提到了MaxRetryError
参数和Tracebacks,但是我不知道它们在这种情况下的含义。有什么办法可以解决此错误(在硒的情况下)?我在Stack Exchange上有一些线程提到了捕获错误,但是仅在max_retries
的上下文中。就我而言,我需要为Selenium软件包捕获相同的错误。
谢谢您的建议
答案 0 :(得分:0)
我的代码仍然偶尔偶尔遇到问题(可以通过使用代理来解决),但是我认为我找到了问题的根源。该循环预期第一个模式匹配将返回.edu
或.com
,但不会预期.org
。因此,当第一个搜索结果返回.org
时,我的代码将无限期运行。这是问题的根源:
website = soup.find('cite', attrs={'class': 'iUh30'}).text
print('tried to get URL, is this it? : {}\n'.format(website))
pattern = re.compile(r'\w+\.(edu|com)') # does not anticipate .org's
现在我的代码可以正常运行了,尽管当代码运行太长时间时我确实遇到了错误(在这种情况下,问题的根源更加清晰了)。
答案 1 :(得分:0)
您退出Chrome驱动程序为时尚早。调用chrome.quit()
后,它将导致后续对chrome.get('https://google.com')
的调用失败,然后自动重试会导致MaxRetryError。
尝试删除对chrome.quit()
的呼叫。