Question

当我将非英语字符串输入YouTube API库时搜索，它只在初始搜索期间有效。如果我调用list_next（），它会抛出一个UnicodeEncodeError。

当我使用简单的ascii字符串时，一切正常。

关于我应该做什么的任何建议？

这是我正在做的简化代码：

# -*- coding: utf-8 -*-
import apiclient.discovery

def test(query):
    youtube = apiclient.discovery.build('youtube', 'v3', developerKey='xxx')
    ys = youtube.search()
    req = ys.list(
        q=query.encode('utf-8'),
        type='video',
        part='id,snippet',
        maxResults=50
    )
    while (req):
        res = req.execute()
        for i in res['items']:
            print(i['id']['videoId'])
        req = ys.list_next(req, res)

test(u'한글')
test(u'日本語')
test(u'\uD55C\uAE00')
test(u'\u65E5\u672C\u8A9E')

错误讯息：

Traceback (most recent call last):
  File "E:\prj\scripts\yt\search.py", line 316, in _search
    req = ys.list_next(req, res)
  File "D:\Apps\Python\lib\site-packages\googleapiclient\discovery.py", line 966, in methodNext
    parsed[4] = urlencode(newq)
  File "D:\Apps\Python\lib\urllib.py", line 1343, in urlencode
    v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)

版本：

google-api-python-client（1.6.2）
Python 2.7.13（Win32）

编辑：我在下面发布了一个解决方法。

Answer 1

如果有其他人感兴趣，这里有一个对我有用的解决方法：

googleapiclient/discovery.py:
(old) q = parse_qsl(parsed[4])
(new) q = parse_qsl(parsed[4].encode('ascii'))

解释

在discovery.py中，list_next（）解析并取消之前的url，然后从中创建一个新的url：

pageToken = previous_response['nextPageToken']
parsed = list(urlparse(request.uri))
q = parse_qsl(parsed[4])

# Find and remove old 'pageToken' value from URI
newq = [(key, value) for (key, value) in q if key != 'pageToken']
newq.append(('pageToken', pageToken))
parsed[4] = urlencode(newq)
uri = urlunparse(parsed)

似乎问题是当parse_qsl unescapes unicode解析[4]时，它以unicode类型返回utf-8编码值。 urlencode不喜欢这样：

q = urlparse.parse_qsl(u'q=%ED%95%9C%EA%B8%80')
[(u'q', u'\xed\x95\x9c\xea\xb8\x80')]
urllib.urlencode(q)
UnicodeEncodeError

如果给parse_qsl一个简单的ascii字符串，它返回一个普通的utf-8编码字符串，urlencode喜欢：

q = urlparse.parse_qsl(u'q=%ED%95%9C%EA%B8%80'.encode('ascii'))
[('q', '\xed\x95\x9c\xea\xb8\x80')]
urllib.urlencode(q)
'q=%ED%95%9C%EA%B8%80'

YouTube API搜索list_next（）会抛出UnicodeEncodeError

1 个答案: