Question

大家好，我的代码基本上用于检查我为在网页中找到某些标签而给出的一些链接。一旦找到它，它将返回给我我给的链接。但是，除非我设置超时，否则有时机械化将永远陷入尝试打开/读取页面的状态。他们是否可以按时重新加载/重试网页？

import mechanize
from mechanize import Browser
from bs4 import BeautifulSoup
import urllib2
import time
import os
from tqdm import tqdm
import socket


br = Browser()

with open("url.txt", 'r+') as f:
lines = f.read().splitlines()

br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

no_stock = []
for i in tqdm(lines):
    r = br.open(i, timeout=200)
    r = r.read()
    done = False
    tries = 3
    while tries and not done:
        try:
            soup = BeautifulSoup(r,'html.parser')
            done = True # exit the loop
        except:
            tries -= 1 # to exit when tries == 0
    if not done:
        print('Failed for {}'.format(i))
        continue # skip this and continue with the next
    table = soup.find_all('div', {'class' : "empty_result"})
    results = soup.find_all('strong', style = 'color: red;')
    if table or results:
        no_stock.append(i)

更新错误：

  File "/usr/local/lib/python2.7/dist-packages/mechanize/_response.py", line 190, in read
    self.__cache.write(self.wrapped.read())
  File "/usr/lib/python2.7/socket.py", line 355, in read
    data = self._sock.recv(rbufsize)
  File "/usr/lib/python2.7/httplib.py", line 587, in read
    return self._read_chunked(amt)
  File "/usr/lib/python2.7/httplib.py", line 656, in _read_chunked
    value.append(self._safe_read(chunk_left))
  File "/usr/lib/python2.7/httplib.py", line 702, in _safe_read
    chunk = self.fp.read(min(amt, MAXAMOUNT))
  File "/usr/lib/python2.7/socket.py", line 384, in read
    data = self._sock.recv(left)
socket.timeout: timed out

感谢任何帮助！

Answer 1

捕获socket.timeout异常并在那里重试：

try:
    # first try
    soup = BeautifulSoup(r,'html.parser')
except socket.timeout:
    # try a second time
    soup = BeautifulSoup(r,'html.parser')

你甚至可以尝试多次，如果一行失败，继续下一步：

for i in tqdm(lines):
    r = br.open(i, timeout=200)
    r = r.read()
    done = False
    tries = 3
    while tries and not done:
        try:
            soup = BeautifulSoup(r,'html.parser')
            done = True # exit the loop
        except: # just catch any error
            tries -= 1 # to exit when tries == 0
    if not done:
        print('Failed for {}'.format(i))
        continue # skip this and continue with the next
    table = soup.find_all('div', {'class' : "empty_result"})
    results = soup.find_all('strong', style = 'color: red;')
    if table or results:
        no_stock.append(i)

超时机械化时重新加载网页

1 个答案: