Question

这个简单的 Python 3 脚本：

import urllib.request

host = "scholar.google.com"
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
url = "http://" + host + link
filename = "cite0.bib"
print(url)
urllib.request.urlretrieve("http://scholar.google.com" + url, filename)

提出了这个例外：

Traceback (most recent call last):
  File "C:/Users/ricardo/Desktop/Google-Scholar/BibTex/test2.py", line 8, in <module>
    urllib.request.urlretrieve("http://scholar.google.com" + url, filename)
  File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python32\lib\urllib\request.py", line 1569, in retrieve
    fp = self.open(url, data)
  File "C:\Python32\lib\urllib\request.py", line 1541, in open
    raise IOError('socket error', msg).with_traceback(sys.exc_info()[2])
  File "C:\Python32\lib\urllib\request.py", line 1537, in open
    return getattr(self, name)(url)
  File "C:\Python32\lib\urllib\request.py", line 1715, in open_http
    return self._open_generic_http(http.client.HTTPConnection, url, data)
  File "C:\Python32\lib\urllib\request.py", line 1695, in _open_generic_http
    http_conn.request("GET", selector, headers=headers)
  File "C:\Python32\lib\http\client.py", line 967, in request
    self._send_request(method, url, body, headers)
  File "C:\Python32\lib\http\client.py", line 1005, in _send_request
    self.endheaders(body)
  File "C:\Python32\lib\http\client.py", line 963, in endheaders
    self._send_output(message_body)
  File "C:\Python32\lib\http\client.py", line 808, in _send_output
    self.send(msg)
  File "C:\Python32\lib\http\client.py", line 746, in send
    self.connect()
  File "C:\Python32\lib\http\client.py", line 724, in connect
    self.timeout, self.source_address)
  File "C:\Python32\lib\socket.py", line 386, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 11004] getaddrinfo failed

我可以打开print语句产生的网址：

http://scholar.google.com/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0

造成这种情况的原因是什么？我尝试将http://更改为http:///（三个斜杠），但引发了同样的异常。

Answer 1

这是你的问题：

urllib.request.urlretrieve("http://scholar.google.com" + url, filename)

您要添加http://scholar.google.com部分两次（url 已经启动http://scholar.google.com）。因此，urillib认为您要求scholar.google.comhttp上的网页 - 不用说，此域名不存在。这正是你的错误所说的。

明显请求url。

将来更快地发现这种事情的方便提示：在为调试添加print语句时，请确保在正在调试的命令中打印正在使用的实际值。 / em>如果您的print语句也连接了基本网址，您将在大约两秒钟内找到此信息。

如何修复此IOError：[Errno socket错误] [Errno 11004]？

1 个答案: