UnicodeEncodeError:'ascii'编解码器无法编码

时间:2016-11-21 20:53:47

标签: python encoding utf-8

我有以下数据容器,不断更新:

  data = []
        for val, track_id in zip(values,list(track_ids)):
            #below
            if val < threshold:
                #structure data as dictionary
                pre_data = {"artist": sp.track(track_id)['artists'][0]['name'], "track":sp.track(track_id)['name'], "feature": filter_name, "value": val}
                data.append(pre_data)
        #write to file
        with open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w') as f:
            json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True)

但是我遇到了很多这样的错误:

json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True) File"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 190, in dump fp.write(chunk) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

有没有办法可以一劳永逸地摆脱这种编码问题?

我被告知会这样做:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

但很多人不推荐它。

我使用python 2.7.10

任何线索?

3 个答案:

答案 0 :(得分:1)

当您写入以文本模式打开的文件时,Python会为您编码字符串。默认编码为ascii,会生成您看到的错误;有一个 lot 的字符无法编码为ASCII。

解决方案是以不同的编码打开文件。在Python 2中,您必须使用codecs模块,在Python 3中,您可以将encoding=参数直接添加到openutf-8是一个受欢迎的选择,因为它可以处理所有的Unicode字符,而对于JSON来说,它是标准的;见https://en.wikipedia.org/wiki/JSON#Data_portability_issues

import codecs
with codecs.open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w', encoding='utf-8') as f:

答案 1 :(得分:1)

你的对象有unicode字符串和python 2.x对unicode的支持可能有点不稳定。首先,让我们举一个演示问题的简短示例:

>>> obj = {"artist":u"Björk"}
>>> import json
>>> with open('deleteme', 'w') as f:
...     json.dump(obj, f, ensure_ascii=False)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 190, in dump
    fp.write(chunk)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 3: ordinal not in range(128)

来自json.dump帮助文字:

If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
output are escaped with ``\uXXXX`` sequences, and the result is a ``str``
instance consisting of ASCII characters only.  If ``ensure_ascii`` is
``False``, some chunks written to ``fp`` may be ``unicode`` instances.
This usually happens because the input contains unicode strings or the
``encoding`` parameter is used. Unless ``fp.write()`` explicitly
understands ``unicode`` (as in ``codecs.getwriter``) this is likely to
cause an error.

啊!有解决方案。使用默认的ensure_ascii=True并获取ascii转义的unicode字符,或使用codecs模块打开具有所需编码的文件。这有效:

>>> import codecs
>>> with codecs.open('deleteme', 'w', encoding='utf-8') as f:
...     json.dump(obj, f, ensure_ascii=False)
... 
>>> 

答案 2 :(得分:0)

为什么不编码特定的字符串呢?尝试,提升异常的字符串上的.encode('utf-8')方法。