尽管使用utf-8进行编码,但仍会出现请求后UnicodeEncodeError

时间:2017-08-01 19:44:36

标签: python encoding utf-8 python-requests

我看过很多关于这个主题的话题,但没有一个能帮助我解决这个问题。我有一个包含许多不同字符的文本的数据集。因此,我在使用Python 2.7.13上的Requests库发出POST请求之前对文本进行编码。

我的代码如下:

# -*- coding: utf-8 -*-
# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')
import json
import requests
text = """So happy to be together on your birthday! ❤ Thankful for real life. ❤ A post shared by Jessica Chastain (@jessicachastain) on Nov 13, 2016 at 5:22am PST"""
textX = json.dumps({'text': text.encode('utf-8')})
r = requests.post('http://####', data=textX,
                      headers={'Content-Type': 'application/json; charset=UTF-8'})
print(r.text)

数据以JSON格式发送。无论我在哪里尝试将文本编码为UTF-8,我仍然会从请求中收到以下错误。

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2764' in
position 42: Body ('❤') is not valid Latin-1. Use body.encode('utf-8')
if you want to send it encoded in UTF-8.

编辑: 语法错误已修复,但不是问题的原因

1 个答案:

答案 0 :(得分:0)

json.dumps的默认值是生成仅ASCII字符串,从而消除编码问题。该错误不使用Unicode字符串。确保将源文件保存在声明的编码(#coding=utf8)中:

# coding=utf8
import json
text = u"""So happy to be together on your birthday! ❤ Thankful for real life. ❤ A post shared by Jessica Chastain (@jessicachastain) on Nov 13, 2016 at 5:22am PST"""
textX = json.dumps({u'text': text})

输出:

'{"text": "So happy to be together on your birthday! \\u2764 Thankful for real life. \\u2764 A post shared by Jessica Chastain (@jessicachastain) on Nov 13, 2016 at 5:22am PST"}'