我在尝试执行代码时遇到很多错误:
import requests
#import bs4 --not sure if it's necessary
from bs4 import BeautifulSoup
core = 'http://wwww.lolnexus.com'
name = input('\nName: ')
region = input('\nRegion NA | EUW | EUNE | BR | TR | RU | LAN | LAS | OCE : ')
full = core + '/' + region + '/' + 'search?name=' + name + '®ion=' + region
print (full)
r = requests.get(full)
source = r.text
soup = BeautifulSoup(source)
print (source)
input()
我不知道出了什么问题。它是应用程序的开始,我尝试编写并且错误阻止我抓取网页的其余部分。
我得到的错误:
Name: Fred
Region NA | EUW | EUNE | BR | TR | RU | LAN | LAS | OCE : TR
http://wwww.lolnexus.com/TR/search?name=Fred®ion=TR
Traceback (most recent call last):
File "C:\Python34\lib\site-packages\requests-2.3.0-py3.4.egg\requests\packages
\urllib3\connectionpool.py", line 493, in urlopen
body=body, headers=headers)
File "C:\Python34\lib\site-packages\requests-2.3.0-py3.4.egg\requests\packages
\urllib3\connectionpool.py", line 291, in _make_request
conn.request(method, url, **httplib_request_kw)
File "C:\Python34\lib\http\client.py", line 1090, in request
self._send_request(method, url, body, headers)
File "C:\Python34\lib\http\client.py", line 1128, in _send_request
self.endheaders(body)
File "C:\Python34\lib\http\client.py", line 1086, in endheaders
self._send_output(message_body)
File "C:\Python34\lib\http\client.py", line 924, in _send_output
self.send(msg)
File "C:\Python34\lib\http\client.py", line 859, in send
self.connect()
File "C:\Python34\lib\site-packages\requests-2.3.0-py3.4.egg\requests\packages
\urllib3\connection.py", line 106, in connect
conn = self._new_conn()
File "C:\Python34\lib\site-packages\requests-2.3.0-py3.4.egg\requests\packages
\urllib3\connection.py", line 90, in _new_conn
(self.host, self.port), self.timeout, *extra_args)
File "C:\Python34\lib\socket.py", line 491, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "C:\Python34\lib\socket.py", line 530, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11004] getaddrinfo failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python34\lib\site-packages\requests-2.3.0-py3.4.egg\requests\adapters
.py", line 344, in send
timeout=timeout
File "C:\Python34\lib\site-packages\requests-2.3.0-py3.4.egg\requests\packages
\urllib3\connectionpool.py", line 543, in urlopen
raise MaxRetryError(self, url, e)
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='www
w.lolnexus.com', port=80): Max retries exceeded with url: /TR/search?name=Fred&r
egion=TR (Caused by <class 'socket.gaierror'>: [Errno 11004] getaddrinfo failed)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\x\Desktop\webscraping.py", line 11, in <module>
r = requests.get(full)
File "C:\Python34\lib\site-packages\requests-2.3.0-py3.4.egg\requests\api.py",
line 55, in get
return request('get', url, **kwargs)
File "C:\Python34\lib\site-packages\requests-2.3.0-py3.4.egg\requests\api.py",
line 44, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python34\lib\site-packages\requests-2.3.0-py3.4.egg\requests\sessions
.py", line 461, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python34\lib\site-packages\requests-2.3.0-py3.4.egg\requests\sessions
.py", line 567, in send
r = adapter.send(request, **kwargs)
File "C:\Python34\lib\site-packages\requests-2.3.0-py3.4.egg\requests\adapters
.py", line 392, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='wwww.lolnexus.com'
, port=80): Max retries exceeded with url: /TR/search?name=Fred®ion=TR (Cause
d by <class 'socket.gaierror'>: [Errno 11004] getaddrinfo failed)
有什么不对,使用Requests&amp; amp;是一个好主意。用于网页抓取的BeautifulSoup库?
答案 0 :(得分:3)
您正尝试使用域名中的四个 http://wwww.lolnexus.com/TR/search?name=Fred®ion=TR
字符连接w
。那个名字不存在。
更正主机名:
core = 'http://www.lolnexus.com'