当我将非英语字符串输入YouTube API库时 搜索,它只在初始搜索期间有效。如果我调用list_next(), 它会抛出一个UnicodeEncodeError。
当我使用简单的ascii字符串时,一切正常。
关于我应该做什么的任何建议?
这是我正在做的简化代码:
# -*- coding: utf-8 -*-
import apiclient.discovery
def test(query):
youtube = apiclient.discovery.build('youtube', 'v3', developerKey='xxx')
ys = youtube.search()
req = ys.list(
q=query.encode('utf-8'),
type='video',
part='id,snippet',
maxResults=50
)
while (req):
res = req.execute()
for i in res['items']:
print(i['id']['videoId'])
req = ys.list_next(req, res)
test(u'한글')
test(u'日本語')
test(u'\uD55C\uAE00')
test(u'\u65E5\u672C\u8A9E')
错误讯息:
Traceback (most recent call last):
File "E:\prj\scripts\yt\search.py", line 316, in _search
req = ys.list_next(req, res)
File "D:\Apps\Python\lib\site-packages\googleapiclient\discovery.py", line 966, in methodNext
parsed[4] = urlencode(newq)
File "D:\Apps\Python\lib\urllib.py", line 1343, in urlencode
v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)
版本:
编辑:我在下面发布了一个解决方法。
答案 0 :(得分:0)
如果有其他人感兴趣,这里有一个对我有用的解决方法:
googleapiclient/discovery.py:
(old) q = parse_qsl(parsed[4])
(new) q = parse_qsl(parsed[4].encode('ascii'))
解释
在discovery.py中,list_next()解析并取消之前的url,然后从中创建一个新的url:
pageToken = previous_response['nextPageToken']
parsed = list(urlparse(request.uri))
q = parse_qsl(parsed[4])
# Find and remove old 'pageToken' value from URI
newq = [(key, value) for (key, value) in q if key != 'pageToken']
newq.append(('pageToken', pageToken))
parsed[4] = urlencode(newq)
uri = urlunparse(parsed)
似乎问题是当parse_qsl unescapes unicode解析[4]时,它
以unicode类型返回utf-8编码值。 urlencode不喜欢
这样:
q = urlparse.parse_qsl(u'q=%ED%95%9C%EA%B8%80')
[(u'q', u'\xed\x95\x9c\xea\xb8\x80')]
urllib.urlencode(q)
UnicodeEncodeError
如果给parse_qsl一个简单的ascii字符串,它返回一个普通的utf-8编码字符串,urlencode喜欢:
q = urlparse.parse_qsl(u'q=%ED%95%9C%EA%B8%80'.encode('ascii'))
[('q', '\xed\x95\x9c\xea\xb8\x80')]
urllib.urlencode(q)
'q=%ED%95%9C%EA%B8%80'