我看过很多关于这个主题的话题,但没有一个能帮助我解决这个问题。我有一个包含许多不同字符的文本的数据集。因此,我在使用Python 2.7.13上的Requests库发出POST请求之前对文本进行编码。
我的代码如下:
# -*- coding: utf-8 -*-
# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')
import json
import requests
text = """So happy to be together on your birthday! ❤ Thankful for real life. ❤ A post shared by Jessica Chastain (@jessicachastain) on Nov 13, 2016 at 5:22am PST"""
textX = json.dumps({'text': text.encode('utf-8')})
r = requests.post('http://####', data=textX,
headers={'Content-Type': 'application/json; charset=UTF-8'})
print(r.text)
数据以JSON格式发送。无论我在哪里尝试将文本编码为UTF-8,我仍然会从请求中收到以下错误。
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2764' in
position 42: Body ('❤') is not valid Latin-1. Use body.encode('utf-8')
if you want to send it encoded in UTF-8.
编辑: 语法错误已修复,但不是问题的原因
答案 0 :(得分:0)
json.dumps
的默认值是生成仅ASCII字符串,从而消除编码问题。该错误不使用Unicode字符串。确保将源文件保存在声明的编码(#coding=utf8
)中:
# coding=utf8
import json
text = u"""So happy to be together on your birthday! ❤ Thankful for real life. ❤ A post shared by Jessica Chastain (@jessicachastain) on Nov 13, 2016 at 5:22am PST"""
textX = json.dumps({u'text': text})
输出:
'{"text": "So happy to be together on your birthday! \\u2764 Thankful for real life. \\u2764 A post shared by Jessica Chastain (@jessicachastain) on Nov 13, 2016 at 5:22am PST"}'