尽管使用quote_plus仍无法从具有非ASCII字符的URL读取

时间:2019-09-03 19:24:44

标签: python python-3.x unicode urllib urllib2

请注意,注释中列出的链接适用于Python 2.7,但是此问题与Python 3.7有关。

我正在使用Python 3.7和Django。我想从字符串中包含特殊字符的URL中读取内容,但是尝试使用传统方式时会出错...

>>> url = "https://www.supergaming.com/f/gaming/article/pvmqe/was_browsing_the_steam_app_reviews_and_ಠ_ಠ/"
...
>>> html = urllib2.urlopen(req, 5000).read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1360, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1240, in _send_request
    self.putrequest(method, url, **skips)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1107, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u0ca0' in position 69: ordinal not in range(128)

所以我尝试了这里推荐的解决方案-How to convert a url string to safe characters with python?,但是我仍然无法读取URL

>>> urllib.parse.quote_plus(url)
'https%3A%2F%2Fwww.supergaming.com%2Ff%2Fgaming%2Farticle%2Fpvmqe%2Fwas_browsing_the_steam_app_reviews_and_%E0%B2%A0_%E0%B2%A0%2F'
>>> req = urllib2.Request(urllib.parse.quote_plus(url))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 328, in __init__
    self.full_url = url
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 354, in full_url
    self._parse()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 383, in _parse
    raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: 'https%3A%2F%2Fwww.supergaming.com%2Fr%2Fgaming%2Farticle%2Fpvmqe%2Fwas_browsing_the_steam_app_reviews_and_%E0%B2%A0_%E0%B2%A0%2F'

从URL读取包含特殊字符的正确方法是什么?

0 个答案:

没有答案