Question

我有一个带有UTF8字符的JSON对象。当我尝试将对象打印到控制台（在Windows 8.1中）时，它会抛出此错误：UnicodeEncodeError: 'charmap' codec can't encode character '\u2026' in position 3706: character maps to <undefined>，因为控制台不支持显示某些UTF8字符。我检查了this answer，但没有一个解决方案有效，因为JSON对象无法编码和解码。如何解决JSON的编码问题？

def getTweets(self, company):
    #params
    baseUrl = 'https://api.twitter.com/1.1/search/tweets.json'
    values = {'q' : company, 'result_type' : 'recent', 'count' : 100}
    params = urllib.parse.urlencode(values)
    url = baseUrl + '?' + params
    #headers
    authorization = 'Bearer %s' % self.bearer 
    acceptEncoding = 'gzip'
    headers = {'User-Agent' : self.userAgent, 'Authorization' : authorization, 'Accept-Encoding' : acceptEncoding}
    req = urllib.request.Request(url, None, headers)
    response = urllib.request.urlopen(req)
    rawData = response.read()
    decompressedData = zlib.decompress(rawData, 16+zlib.MAX_WBITS)      
    decompressedData = decompressedData.decode('utf-8')
    #print(decompressedData)
    jsonData = json.loads(decompressedData)
    print(jsonData)

Answer 1

你说你的控制台不支持UTF-8。所以你需要使用另一种编码。我将尝试解释编码，解码和打印如何协同工作，导致您的异常;使用decode（encoding），您可以将字节字符串转换为唯一 unicode表示形式。您指定编码因为没有它，一个字节可以映射到几乎任何字符。您需要知道从网站获得的数据的编码，尽管它通常是UTF-8。

第一步，当您从应用程序外部获取文本时，将获得唯一的unicode表示，这样您就不需要记住应用程序中每个文本的编码。

使用print语句打印unicode时，它会假定您使用标准编码，但您可以指定不同的标准编码。该错误意味着print尝试在unicode文本上使用标准编码但失败，因为它无法将其定义范围之外的字符编码为字节表示。

标准编码是：

print sys.stdout.encoding

将应用程序中的文本提供给另一个应用程序时，或者当您想要存储文本时，需要将其编码为字节表示形式。因此，当您将unicode字符串提供给控制台时，需要将其转换为具有所需编码的字节表示形式。对于控制台，我想它希望应用程序中的字节采用标准编码。

因此，要打印unicode字符串，可以在unicode字符串上使用encode将其转换为控制台可以处理的字节表示形式。例如，您可以将它们转换为ascii字节表示，并用问号替换ascii定义范围之外的字符：

# bytes to unicode
decompressedData_unicode = decompressedData.decode('utf-8')
# unicode to bytes
decompressedData_string = decompressedData_unicode.encode('ascii', 'replace')
# hope that the consoles standard encoding is compatible with ascii
print decompressedData_string

如果控制台允许其他编码，您可以将其设置为标准编码并直接打印unicode字符串，或执行：

decompressedData_string = decompressedData_unicode.encode('standard encoding', 'replace')
print decompressedData_string

并希望标准编码可以代表decompressedData_unicode中的每个unicode字符。

带有JSON数据的UnicodeEncodeError

1 个答案: