YouTube API搜索list_next()会抛出UnicodeEncodeError

时间:2017-03-25 18:28:54

标签: python python-2.7 youtube-api

当我将非英语字符串输入YouTube API库时 搜索,它只在初始搜索期间有效。如果我调用list_next(), 它会抛出一个UnicodeEncodeError。

当我使用简单的ascii字符串时,一切正常。

关于我应该做什么的任何建议?


这是我正在做的简化代码:

# -*- coding: utf-8 -*-
import apiclient.discovery

def test(query):
    youtube = apiclient.discovery.build('youtube', 'v3', developerKey='xxx')
    ys = youtube.search()
    req = ys.list(
        q=query.encode('utf-8'),
        type='video',
        part='id,snippet',
        maxResults=50
    )
    while (req):
        res = req.execute()
        for i in res['items']:
            print(i['id']['videoId'])
        req = ys.list_next(req, res)

test(u'한글')
test(u'日本語')
test(u'\uD55C\uAE00')
test(u'\u65E5\u672C\u8A9E')


错误讯息:

Traceback (most recent call last):
  File "E:\prj\scripts\yt\search.py", line 316, in _search
    req = ys.list_next(req, res)
  File "D:\Apps\Python\lib\site-packages\googleapiclient\discovery.py", line 966, in methodNext
    parsed[4] = urlencode(newq)
  File "D:\Apps\Python\lib\urllib.py", line 1343, in urlencode
    v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)


版本:

  • google-api-python-client(1.6.2)
  • Python 2.7.13(Win32)


编辑:我在下面发布了一个解决方法。

1 个答案:

答案 0 :(得分:0)

如果有其他人感兴趣,这里有一个对我有用的解决方法:

googleapiclient/discovery.py:
(old) q = parse_qsl(parsed[4])
(new) q = parse_qsl(parsed[4].encode('ascii'))


解释

在discovery.py中,list_next()解析并取消之前的url,然后从中创建一个新的url:

pageToken = previous_response['nextPageToken']
parsed = list(urlparse(request.uri))
q = parse_qsl(parsed[4])

# Find and remove old 'pageToken' value from URI
newq = [(key, value) for (key, value) in q if key != 'pageToken']
newq.append(('pageToken', pageToken))
parsed[4] = urlencode(newq)
uri = urlunparse(parsed)


似乎问题是当parse_qsl unescapes unicode解析[4]时,它 以unicode类型返回utf-8编码值。 urlencode不喜欢 这样:

q = urlparse.parse_qsl(u'q=%ED%95%9C%EA%B8%80')
[(u'q', u'\xed\x95\x9c\xea\xb8\x80')]
urllib.urlencode(q)
UnicodeEncodeError


如果给parse_qsl一个简单的ascii字符串,它返回一个普通的utf-8编码字符串,urlencode喜欢:

q = urlparse.parse_qsl(u'q=%ED%95%9C%EA%B8%80'.encode('ascii'))
[('q', '\xed\x95\x9c\xea\xb8\x80')]
urllib.urlencode(q)
'q=%ED%95%9C%EA%B8%80'