这个简单的 Python 3 脚本:
import urllib.request
host = "scholar.google.com"
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
url = "http://" + host + link
filename = "cite0.bib"
print(url)
urllib.request.urlretrieve("http://scholar.google.com" + url, filename)
提出了这个例外:
Traceback (most recent call last):
File "C:/Users/ricardo/Desktop/Google-Scholar/BibTex/test2.py", line 8, in <module>
urllib.request.urlretrieve("http://scholar.google.com" + url, filename)
File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Python32\lib\urllib\request.py", line 1569, in retrieve
fp = self.open(url, data)
File "C:\Python32\lib\urllib\request.py", line 1541, in open
raise IOError('socket error', msg).with_traceback(sys.exc_info()[2])
File "C:\Python32\lib\urllib\request.py", line 1537, in open
return getattr(self, name)(url)
File "C:\Python32\lib\urllib\request.py", line 1715, in open_http
return self._open_generic_http(http.client.HTTPConnection, url, data)
File "C:\Python32\lib\urllib\request.py", line 1695, in _open_generic_http
http_conn.request("GET", selector, headers=headers)
File "C:\Python32\lib\http\client.py", line 967, in request
self._send_request(method, url, body, headers)
File "C:\Python32\lib\http\client.py", line 1005, in _send_request
self.endheaders(body)
File "C:\Python32\lib\http\client.py", line 963, in endheaders
self._send_output(message_body)
File "C:\Python32\lib\http\client.py", line 808, in _send_output
self.send(msg)
File "C:\Python32\lib\http\client.py", line 746, in send
self.connect()
File "C:\Python32\lib\http\client.py", line 724, in connect
self.timeout, self.source_address)
File "C:\Python32\lib\socket.py", line 386, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 11004] getaddrinfo failed
我可以打开print
语句产生的网址:
造成这种情况的原因是什么?我尝试将http://
更改为http:///
(三个斜杠),但引发了同样的异常。
答案 0 :(得分:2)
这是你的问题:
urllib.request.urlretrieve("http://scholar.google.com" + url, filename)
您要添加http://scholar.google.com
部分两次(url
已经启动http://scholar.google.com
)。因此,urillib
认为您要求scholar.google.comhttp
上的网页 - 不用说,此域名不存在。这正是你的错误所说的。
明显请求url
。
将来更快地发现这种事情的方便提示:在为调试添加print
语句时,请确保在正在调试的命令中打印正在使用的实际值。 / em>如果您的print
语句也连接了基本网址,您将在大约两秒钟内找到此信息。