(Python)字符串值不正确(CP1521到UTF8)

时间:2017-04-09 18:21:59

标签: python json cp1251

.json文件存在问题,其中包含西里尔文符号。如何将CP1251转换为UTF-8? ( temp_data.decode(' utf-8')无效,例如.dumps中的 ensure_ascii = False

import json

def load_data(filepath):   
    with open(filepath, 'r') as f:
        temp_data = json.load(f)
    return temp_data 


    def pretty_print_json(d):
        out_json = json.dumps(d, sort_keys=True, indent=4, separators = (',', ': '))
        print(out_json)

    if __name__ == '__main__':
        print("Enter the path to .json file: ") 
        in_path = input()
        print("There are pretty printed json format: ")
        pretty_print_json(load_data(in_path))

2 个答案:

答案 0 :(得分:0)

您可以传递ensure_ascii,如果ensure_ascii为真(默认值),输出中的所有非ASCII字符都会使用\ uXXXX序列进行转义,结果是由ASCII组成的str实例仅限字符。如果ensure_ascii为false,则结果可能是Unicode实例。如果输入包含Unicode字符串或使用编码参数,则通常会发生这种情况。

将您的代码更改为:

out_json = json.dumps(d, sort_keys=True, indent=4, separators = (',', ': '), ensure_ascii=False)

还有一个完整的代码:

import json

def load_data(filepath):   
    with open(filepath, 'r') as f:
        temp_data = json.load(f)
    return temp_data 


def pretty_print_json(d):
    out_json = json.dumps(d, sort_keys=True, indent=4, separators = (',', ': '), ensure_ascii=False)
    print(out_json)

if __name__ == '__main__':
    print("Enter the path to .json file: ") 
    in_path = raw_input()
    print("There are pretty printed json format: ")
    pretty_print_json(load_data(in_path))

我用这个JSON文件测试了this代码。

您可以在asciinema中看到结果。

答案 1 :(得分:0)

这很有效。提供数据文件样本,如果您的数据不是

,请指定编码
#coding:utf8
import json

datafile_encoding = 'cp1251'  # Any encoding that supports Cyrillic works.

# Create a test file with Cyrillic symbols.
with open('test.json','w',encoding=datafile_encoding) as f:
    D = {'key':'АБВГДЕЖЗИЙКЛМНОПРСТ', 'key2':'АБВГДЕЖЗИЙКЛМНОПРСТ'}
    json.dump(D,f,ensure_ascii=False)

# specify the encoding of the data file
def load_data(filepath):   
    with open(filepath, 'r', encoding=datafile_encoding) as f:
        temp_data = json.load(f)
    return temp_data 

# Use ensure_ascii=False
def pretty_print_json(d):
    out_json = json.dumps(d, sort_keys=True, ensure_ascii=False, indent=4, separators = (',', ': '))
    print(out_json)

if __name__ == '__main__':
    in_path = 'test.json'
    pretty_print_json(load_data(in_path))
{
    "key": "АБВГДЕЖЗИЙКЛМНОПРСТ",
    "key2": "АБВГДЕЖЗИЙКЛМНОПРСТ"
}