我现在正在学习请求是如何工作的限制,我遇到了问题,我使用Python(PyCharm)3.4,使用urllib.request BeautifulSoup的库, 现在我有一个网站,我正在对他进行测试, 我有这个功能:
def GetAll(url):
req = urllib.request.Request(url, headers={'User-Agent': "Magic Browser"})
html_page = urllib.request.urlopen(req)
soup = BeautifulSoup(html_page, "html.parser")
header = soup.find('h1', attrs={'sub': re.compile("subheader")}).string
return header
它只是一个简单的函数,我发送一堆页面(网址),我做了一个`
while untill an index <= len(pages):
header = GetAll(pages[index])
print(header)
所以有些论坛有10~20页,有些论坛有100~300页,每页有50个科目, 但在一天结束时我回来看看列表是否已完成打印 有时它的打印100个标题有时是1000,有时高达5000我需要它在一个小数学30K主题后打印所有它,所以不得不做30K请求但我最终得到这个:
Traceback (most recent call last):
File "C:\Users\Bar\AppData\Local\Programs\Python\Python34\lib\urllib\request.py", line 1183, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "C:\Users\Bar\AppData\Local\Programs\Python\Python34\lib\http\client.py", line 1137, in request
self._send_request(method, url, body, headers)
File "C:\Users\Bar\AppData\Local\Programs\Python\Python34\lib\http\client.py", line 1182, in _send_request
self.endheaders(body)
File "C:\Users\Bar\AppData\Local\Programs\Python\Python34\lib\http\client.py", line 1133, in endheaders
self._send_output(message_body)
File "C:\Users\Bar\AppData\Local\Programs\Python\Python34\lib\http\client.py", line 963, in _send_output
self.send(msg)
File "C:\Users\Bar\AppData\Local\Programs\Python\Python34\lib\http\client.py", line 898, in send
self.connect()
File "C:\Users\Bar\AppData\Local\Programs\Python\Python34\lib\http\client.py", line 871, in connect
self.timeout, self.source_address)
File "C:\Users\Bar\AppData\Local\Programs\Python\Python34\lib\socket.py", line 498, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "C:\Users\Bar\AppData\Local\Programs\Python\Python34\lib\socket.py", line 537, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11004] getaddrinfo failed