如何在python中编写一个将非ascii字符存储到json文件中的字典

时间:2017-09-14 19:53:22

标签: python json dictionary unicode

这是我第一次使用python,因此如果问题很明显就很抱歉。

这里我有一个字典,我想把它写成一个json文件。为此,我做了以下几点:

result = {}
result["c2i"] = c2i # c2i is a dictionary
result["data"] = data # data is a list that stores integers
with io.open("json_test.json", 'w',encoding='utf-8') as outfile:
    json.dump(result, outfile)

不幸的是,当我这样做时,我收到了以下错误:

Traceback (most recent call last):
  File "dataprocess.py", line 140, in <module>
    json.dump(result, outfile)
  File "/home/anaconda2/lib/python2.7/json/__init__.py", line 190, in dump
    fp.write(chunk)
TypeError: write() argument 1 must be unicode, not str

i2c词典的内容如下:

>>> i2c
{49: '&', 50: '|',  56: '^', 57: '=', 58: '<', 59: '*', 60: '\xc2', 61: '\xa3', 62: '$', 63: '\xc3', 64: '\xa2', 65: '\xe2', 66: '\x82', 67: '\xac', 68: '\xef', 69: '\xbc', 70: '\xa6', 71: '\xaf', 72: '\xb7', 73: '>', 74: '+', 75: '\xab', 76: '\x97', 77: '~', 78: '\xad', 79: '\x98', 80: '\x86', 81: '\xb3', 82: ']', 83: '\x84', 84: '\x83', 85: '\xf0', 86: '\x9f', 87: '\x87', 88: '\xb1', 89: '\xb4', 90: '\xc4', 91: '\xb0', 92: '\xb6', 93: '[', 94: '\\', 95: '\xf3', 96: '\xbe', 97: '\x8d', 98: '\x81', 99: '\xe3', 100: '\xbb', 101: '\x8b', 102: '\xc5', 103: '\x93', 104: '\x85', 105: '\xe4', 106: '\xbd', 107: '\xa0', 108: '\xe5', 109: '\xe7', 110: '\xae', 111: '\xe9', 112: '\x9a', 113: '\x94', 114: '\xe6', 115: '\x88', 116: '\x91', 117: '\xa5', 118: '\xe8', 119: '\xb2', 120: '}', 121: '\xe0', 122: '\xb8', 123: '\xa7', 124: ':broken_heart:', 125: ':loudly_crying_face:', 126: ':black_rightwards_arrow:', 127: ':white_left_pointing_backhand_index:', 128: ':dizzy_face:', 129: ':cloud:', 130: ':white_right_pointing_backhand_index:', 131: ':heavy_black_heart:', 132: ':smiling_face_with_smiling_eyes:', 133: ':sparkling_heart:', 134: ':smiling_cat_face_with_heart-shaped_eyes:', 135: ':oncoming_bus:', 136: ':man_with_turban:', 137: ':confused_face:', 138: ':cross_mark:', 139: ':smiling_face_with_open_mouth_and_tightly-closed_eyes:', 140: ':party_popper:', 141: ':open_hands_sign:', 142: ':earth_globe_asia-australia:', 143: ':sleepy_face:', 144: ':pensive_face:', 145: ':weary_face:', 146: ':smiling_face_with_sunglasses:', 147: ':droplet:', 148: ':persevering_face:', 149: ':crown:', 150: ':sleeping_face:', 151: ':musical_score:', 152: ':teacup_without_handle:', 153: ':hot_beverage:', 154: ':awe|boy:', 155: ':cocktail_glass:', 156: ':worried_face:', 157: ':thought_balloon:', 158: ':cat_face:', 159: ':personal_computer:', 160: ':splashing_sweat_symbol:', 161: ':electric_plug:', 162: ':kiss_mark:', 163: ':trophy:', 164: ':airplane:', 165: ':face_with_no_good_gesture:', 166: ':princess:', 167: ':disappointed_face:', 168: ':pouting_face:', 169: ':sparkles:', 170: ':high_voltage_sign:', 171: ':bomb:', 172: ':purple_heart:', 173: ':christmas_tree:', 174: ':black_heart_suit:', 175: ':speak-no-evil_monkey:', 176: ':woman_with_bunny_ears:', 177: ':person_bowing_deeply:', 178: ':smiling_face_with_halo:', 179: ':smiling_face_with_heart-shaped_eyes:', 180: ':beating_heart:', 181: ':unamused_face:', 182: ':ok_hand_sign:', 183: ':smiling_face_with_open_mouth:', 184: ':see-no-evil_monkey:', 185: ':face_without_mouth:', 186: ':musical_note:', 187: ':hocho:', 188: ':violin:', 189: ':smiling_face_with_open_mouth_and_cold_sweat:', 190: ':basketball_and_hoop:', 191: ':person_raising_both_hands_in_celebration:', 192: ':books:', 193: ':pistol:', 194: ':happy_person_raising_one_hand:', 195: ':thumbs_up_sign:', 196: ':heart_with_arrow:', 197: ':thumbs_down_sign:', 198: ':grinning_face_with_smiling_eyes:', 199: ':weary_cat_face:', 200: ':snowflake:', 201: ':multiple_musical_notes:', 202: ':frog_face:', 203: ':umbrella_with_rain_drops:', 204: ':runner:', 205: ':winking_face:', 206: ':fire_engine:', 207: ':face_with_medical_mask:', 208: ':green_heart:', 209: ':face_with_ok_gesture:', 210: ':camera:', 211: ':french_fries:', 212: ':tropical_drink:', 213: ':smiling_face_with_open_mouth_and_smiling_eyes:', 214: ':astonished_face:', 215: ':hundred_points_symbol:', 216: ':palm_tree:', 217: ':face_with_open_mouth_and_cold_sweat:', 218: ':clinking_beer_mugs:', 219: ':dash_symbol:', 220: ':flag_for_faroe_islands:', 221: ':face_with_stuck-out_tongue:', 222: ':pedestrian:', 223: ':face_throwing_a_kiss:', 224: ':raised_hand:', 225: ':confounded_face:', 226: ':dog_face:', 227: ':police_car:', 228: ':bath:', 229: ':face_screaming_in_fear:', 230: ':bust_in_silhouette:', 231: ':baseball:', 232: ':ambulance:', 233: ':squared_sos:', 234: ':wine_glass:', 235: ':imagined...re:', 236: ':face_with_tears_of_joy:', 237: ':dancer:', 238: ':clapping_hands_sign:', 239: ':heavy_large_circle:', 240: ':face_with_stuck-out_tongue_and_winking_eye:', 241: ':hatching_chick:', 242: ':open_book:', 243: ':white_smiling_face:', 244: ':fisted_hand_sign:', 245: ':tired_face:', 246: ':face_with_stuck-out_tongue_and_tightly-closed_eyes:', 247: ':snowman_without_snow:', 248: ':information_desk_person:', 249: ':two_women_holding_hands:', 250: ':two_hearts:', 251: ':angry_face:', 252: ':headphone:', 253: ':white_heavy_check_mark:', 254: ':wrapped_present:', 255: ':floppy_disk:', 256: ':soon_with_rightwards_arrow_above:', 257: ':white_frowning_face:', 258: ':grinning_face:', 259: ':black_sun_with_rays:', 260: ':crying_face:', 261: ':aubergine:', 262: ':face_savouring_delicious_food:', 263: ':victory_hand:', 264: ':flag_for_united_kingdom:', 265: ':flushed_face:', 266: ':mouse:', 267: ':rocket:', 268: ':person_with_folded_hands:', 269: ':father_christmas:', 270: ':face_with_look_of_triumph:', 271: ':nail_polish:', 272: ':skull:', 273: ':fork_and_knife:', 274: ':expressionless_face:', 275: ':growing_heart:', 276: ':microphone:', 277: ':fire:', 278: ':sleeping_symbol:', 279: ':money_bag:', 280: ':grimacing_face:', 281: ':flexed_biceps:', 282: ':smirking_face:', 283: ':pile_of_poo:', 284: ':slice_of_pizza:', 285: ':neutral_face:'}

我相信,像'\ xc2'这样的键会导致问题,但我无法找到解决问题的方法。我稍后会使用其他编程语言的这些json文件。

编辑:我使用的是Python 2.7

EDIT 2

正如其中一个答案中所建议的那样,我选择了第二个选项:

result = {}
result[u"i2c"] = {k:v.decode('iso-8859-1') for k, v in i2c.items()}
result["data"] = encoded_data
with io.open("deneme_jstonout.json", 'w') as outfile:
    json.dump(result, outfile)

然而在这种情况下,我收到以下错误:

Traceback (most recent call last):
  File "dataprocess.py", line 137, in <module>
    result[u"i2c"] = {k:v.decode('iso-8859-1') for k, v in i2c.items()}
  File "dataprocess.py", line 137, in <dictcomp>
    result[u"i2c"] = {k:v.decode('iso-8859-1') for k, v in i2c.items()}
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc2' in position 0: ordinal not in range(128)

当我尝试将字典中的每个值转换为unicode时,我也会收到错误:

编辑3

>>> for i in i2c:
...     i2c[i] = unicode(i2c[i])
... 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

1 个答案:

答案 0 :(得分:1)

问题是,JSON是Unicode,因此json模块需要字符串数据为unicode数据类型数据,因为它不愿意为您猜测字符串的编码。这是一个Python2问题,在Python 3中,所有字符串都已经是unicode。你有3个选择:

1)使用unicode文字

i2c = {
    49: u'&', 
    50: u'|',  
    56: u'^', 
    57: u'=', 
    58: u'<',
    ...
    285: u':neutral_face:',
}

如果要从其他来源(API,数据库,文本文件)导入数据,最佳做法是在数据进入应用程序时始终将数据解码为unicode,并在数据离开应用程序时对数据进行编码。

2)将字符串数据转换为unicode

 result[u"i2c"] = {k:v.decode('iso-8859-1') for k, v in i2c.items()}
 result[u"data"] = data

您的示例看起来不像UTF-8,所以我猜测了Latin1,但您必须知道真正的编解码器,因为它可能不是Latin1(并且任何解码为Latin1)。

3)使用Python3

Python 3使得关于unicode的一切更加明确。任何io操作都会给你unicode字符串或字节,所以没有歧义。你的程序是否有效,在Python 2中,该程序似乎可以工作,但不时会在火焰中爆炸。