Question

我正在尝试将CSV文件转换为json文件。在那个过程中，当我尝试写入json文件时，我收到一个关于unicode错误的错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\u06ec' in position 933: ordinal not in range(128)

我的代码：

import csv
import json
import codecs


csvfile = codecs.open('my.csv', 'r', encoding='utf-8', errors='ignore')
jsonfile = codecs.open('my.json',"w", encoding='utf-8',errors='ignore')

fieldnames = ("Title","Date","Text","Country","Page","Week")
reader = csv.DictReader(csvfile, fieldnames)
for row in reader:
    row['Text'] = row['Text'].encode('ascii',errors='ignore') #error occur on this line

    json.dump(row, jsonfile)
    jsonfile.write('\n')

一行示例：

{'Country': 'UK', 'Title': '12345', 'Text': "  hi there  hi john i currently ", 'Week': 'week2', 'Page': 'homepage', 'Date': '1/3/16'}

Answer 1

不要转换为ASCII。

JSON本地处理unicode。只需删除.encode("ascii", ...)部分。

此外，您不需要在用于JSON的文件对象上设置encoding，因为JSON已经正确地序列化了unicode。

Answer 2

编辑我的代码以将CSV文件读取为二进制文件。然后它给了我另一个无效字节的问题，我通过将文本字符串转换为unicode来解决这个问题：

这是工作代码：

csvfile = open('my.csv', 'rb')
jsonfile = codecs.open('my.json',"w")

fieldnames = ("Title","Date","Text","Country","Page","Week")
reader = csv.DictReader(csvfile, fieldnames)
for row in reader:
    print row
    row['Text'] = unicode(row['Text'],errors='replace')

UnicodeEncodeError：＆＃39; ascii＆＃39;尽管尝试了其他SO解决方案，编解码器仍然无法对字符进行编码

2 个答案:

不要转换为ASCII。