读取/解析URL列表时出现http.client.RemoteDisconnected错误

时间:2017-04-28 09:37:22

标签: python python-3.x urlopen

我正在开发一个简单的url解析器:想法是在一列中获取一个url,尝试解析它并打印出重定向到的位置的输出。

我的基本功能正常工作,但是每隔一段时间它就抛出一个http.client.RemoteDisconnected异常并且程序停止:抛出一些错误(下面)。

Traceback (most recent call last):
  File "URLIFIER.py", line 43, in <module>
    row.append(urlparse(row[0]))
  File "URLIFIER.py", line 12, in urlparse
    conn = urllib.request.urlopen(urlColumnElem,timeout=8)
  File "//anaconda/lib/python3.5/urllib/request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "//anaconda/lib/python3.5/urllib/request.py", line 466, in open
    response = self._open(req, data)
  File "//anaconda/lib/python3.5/urllib/request.py", line 484, in _open
    '_open', req)
  File "//anaconda/lib/python3.5/urllib/request.py", line 444, in _call_chain
    result = func(*args)
  File "//anaconda/lib/python3.5/urllib/request.py", line 1282, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "//anaconda/lib/python3.5/urllib/request.py", line 1257, in do_open
    r = h.getresponse()
  File "//anaconda/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "//anaconda/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "//anaconda/lib/python3.5/http/client.py", line 266, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

这是在我用大约40分钟左右穿过4K网址后发生的。有时如果我再次重新运行脚本(相同的输入),它将通过并完成没有问题。我已经读过一些网站试图阻止pythons urlopen以减少网络负载,并且设置用户代理会有所帮助。是否缺少设置用户代理导致此问题?

对于完成大部分工作的功能如下:

def urlparse(urlColumnElem):
    try:
        #default timeout is 8 seconds.
        conn = urllib.request.urlopen(urlColumnElem,timeout=8)
        redirect=conn.geturl()
        #check redirect
        if(redirect == urlColumnElem):
            #print ("same: ")
            #print(redirect)
            return (redirect)
        else:
            #print("Not the same url ")
            return(redirect)
    #catch all the exceptions
    except urllib.error.HTTPError as e:
        return (e.code)
    except urllib.error.URLError as e:
        return ('URL_Error')
    except socket.timeout as e:
        return ("timeout")

1 个答案:

答案 0 :(得分:0)

解决:实际上非常简单:

添加

  

http.client.HTTPException

。在python2中它将是

  

httplib.HTTPException为e:

Artisan::call('migrate',
    ['database' => config("database.connections.{$newConf}")
])