大家好,我的代码基本上用于检查我为在网页中找到某些标签而给出的一些链接。一旦找到它,它将返回给我我给的链接。但是,除非我设置超时,否则有时机械化将永远陷入尝试打开/读取页面的状态。他们是否可以按时重新加载/重试网页?
import mechanize
from mechanize import Browser
from bs4 import BeautifulSoup
import urllib2
import time
import os
from tqdm import tqdm
import socket
br = Browser()
with open("url.txt", 'r+') as f:
lines = f.read().splitlines()
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
no_stock = []
for i in tqdm(lines):
r = br.open(i, timeout=200)
r = r.read()
done = False
tries = 3
while tries and not done:
try:
soup = BeautifulSoup(r,'html.parser')
done = True # exit the loop
except:
tries -= 1 # to exit when tries == 0
if not done:
print('Failed for {}'.format(i))
continue # skip this and continue with the next
table = soup.find_all('div', {'class' : "empty_result"})
results = soup.find_all('strong', style = 'color: red;')
if table or results:
no_stock.append(i)
更新错误:
File "/usr/local/lib/python2.7/dist-packages/mechanize/_response.py", line 190, in read
self.__cache.write(self.wrapped.read())
File "/usr/lib/python2.7/socket.py", line 355, in read
data = self._sock.recv(rbufsize)
File "/usr/lib/python2.7/httplib.py", line 587, in read
return self._read_chunked(amt)
File "/usr/lib/python2.7/httplib.py", line 656, in _read_chunked
value.append(self._safe_read(chunk_left))
File "/usr/lib/python2.7/httplib.py", line 702, in _safe_read
chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/usr/lib/python2.7/socket.py", line 384, in read
data = self._sock.recv(left)
socket.timeout: timed out
感谢任何帮助!
答案 0 :(得分:1)
捕获socket.timeout
异常并在那里重试:
try:
# first try
soup = BeautifulSoup(r,'html.parser')
except socket.timeout:
# try a second time
soup = BeautifulSoup(r,'html.parser')
你甚至可以尝试多次,如果一行失败,继续下一步:
for i in tqdm(lines):
r = br.open(i, timeout=200)
r = r.read()
done = False
tries = 3
while tries and not done:
try:
soup = BeautifulSoup(r,'html.parser')
done = True # exit the loop
except: # just catch any error
tries -= 1 # to exit when tries == 0
if not done:
print('Failed for {}'.format(i))
continue # skip this and continue with the next
table = soup.find_all('div', {'class' : "empty_result"})
results = soup.find_all('strong', style = 'color: red;')
if table or results:
no_stock.append(i)