python问题中的UTF-8

时间:2013-09-17 04:30:10

标签: python utf-8

嗯,我不是蟒蛇中utf-8的粉丝;似乎无法弄清楚如何解决这个问题。正如你所看到的,我已经尝试对B64进行编码了,但看起来python正试图将它从utf-8转换为ascii ......

一般情况下,我正在尝试使用urllib2发布包含UTF-8字符的表单数据。我猜一般情况下它与How to send utf-8 content in a urllib2 request?相同,尽管没有有效的答案。我试图通过base64编码只发送一个字节字符串。

Traceback (most recent call last):
  File "load.py", line 165, in <module>
    main()
  File "load.py", line 17, in main
    beers()
  File "load.py", line 157, in beers
    resp = send_post("http://localhost:9000/beers", beer)
  File "load.py", line 64, in send_post
    connection.request ('POST', req.get_selector(), *encode_multipart_data (data, files))
  File "load.py", line 49, in encode_multipart_data
    lines.extend (encode_field (name))
  File "load.py", line 34, in encode_field
    '', base64.b64encode(u"%s" % data[field_name]))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/base64.py", line 53, in b64encode
    encoded = binascii.b2a_base64(s)[:-1]
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 7: ordinal not in range(128)

代码:

def random_string (length):
    return ''.join (random.choice (string.ascii_letters) for ii in range (length + 1))


def encode_multipart_data (data, files):
    boundary = random_string (30)

    def get_content_type (filename):
      return mimetypes.guess_type (filename)[0] or 'application/octet-stream'

    def encode_field (field_name):
      return ('--' + boundary,
              'Content-Disposition: form-data; name="%s"' % field_name,
              'Content-Transfer-Encoding: base64',
              '', base64.b64encode(u"%s" % data[field_name]))

    def encode_file (field_name):
      filename = files [field_name]
      file_size = os.stat(filename).st_size
      file_data = open(filename, 'rb').read()
      file_b64 = base64.b64encode(file_data)
      return ('--' + boundary,
              'Content-Disposition: form-data; name="%s"; filename="%s"' % (field_name, filename),
              'Content-Type: %s' % get_content_type(filename),
              'Content-Transfer-Encoding: base64',
              '', file_b64)

    lines = []
    for name in data:
      lines.extend (encode_field (name))
    for name in files:
      lines.extend (encode_file (name))
    lines.extend (('--%s--' % boundary, ''))
    body = '\r\n'.join (lines)

    headers = {'content-type': 'multipart/form-data; boundary=' + boundary,
               'content-length': str(len(body))}

    return body, headers


def send_post (url, data, files={}):
    req = urllib2.Request (url)
    connection = httplib.HTTPConnection (req.get_host())
    connection.request ('POST', req.get_selector(), *encode_multipart_data (data, files))
    return connection.getresponse()

啤酒对象的json是(这是传递给data的{​​{1}}):

encode_multipart_data

1 个答案:

答案 0 :(得分:3)

您无法对Unicode进行64位编码,只能使用字节字符串。在Python 2.7中,将Unicode字符串提供给需要字节字符串的函数会导致使用ascii编解码器隐式转换为字节字符串,从而导致出现错误:

>>> base64.b64encode(u'America\u2019s')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\base64.py", line 53, in b64encode
    encoded = binascii.b2a_base64(s)[:-1]
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 7: ordinal not in range(128)

首先使用有效编码将其编码为字节字符串:

>>> base64.b64encode(u'America\u2019s'.encode('utf8'))
'QW1lcmljYeKAmXM='