我正在尝试将CSV文件转换为json文件。在那个过程中,当我尝试写入json文件时,我收到一个关于unicode错误的错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u06ec' in position 933: ordinal not in range(128)
我的代码:
import csv
import json
import codecs
csvfile = codecs.open('my.csv', 'r', encoding='utf-8', errors='ignore')
jsonfile = codecs.open('my.json',"w", encoding='utf-8',errors='ignore')
fieldnames = ("Title","Date","Text","Country","Page","Week")
reader = csv.DictReader(csvfile, fieldnames)
for row in reader:
row['Text'] = row['Text'].encode('ascii',errors='ignore') #error occur on this line
json.dump(row, jsonfile)
jsonfile.write('\n')
一行示例:
{'Country': 'UK', 'Title': '12345', 'Text': " hi there hi john i currently ", 'Week': 'week2', 'Page': 'homepage', 'Date': '1/3/16'}
答案 0 :(得分:3)
JSON本地处理unicode。
只需删除.encode("ascii", ...)
部分。
此外,您不需要在用于JSON的文件对象上设置encoding
,因为JSON已经正确地序列化了unicode。
答案 1 :(得分:0)
编辑我的代码以将CSV文件读取为二进制文件。然后它给了我另一个无效字节的问题,我通过将文本字符串转换为unicode来解决这个问题:
这是工作代码:
csvfile = open('my.csv', 'rb')
jsonfile = codecs.open('my.json',"w")
fieldnames = ("Title","Date","Text","Country","Page","Week")
reader = csv.DictReader(csvfile, fieldnames)
for row in reader:
print row
row['Text'] = unicode(row['Text'],errors='replace')