Question

在获取充满unicode文本的请求之后。我可以打印出来并看到所有内容都已正确格式化，如下所示。

在此之后，我将该文本写入以json格式存储的txt文件：

def parse_contents(res_dict, file):

content_payload = res_dict['parse']['wikitext']['*']
sections_payload = res_dict['parse']['sections']
db = {}

def now_next_iter(iterable):
    import itertools
    a, b = itertools.tee(sections_payload)
    next(b, None)
    return itertools.izip(a, b)

def remove_tags(text):
    import re
    return re.sub('<[^<]+?>', '', text)

for cur, nxt in now_next_iter(sections_payload):

    if cur['toclevel'] == 2:
        head = cur['line']
        db[head] = {}
    elif cur['toclevel']  == 3:
        line = cur['line']
        ibo = cur['byteoffset']
        fbo = nxt['byteoffset']

        content = remove_tags(content_payload[ibo:fbo])
        print content #===============> THIS IS PRINTING OUT THE STUFF IN THE PICTURE ABOVE
        db[head][line] = content

with io.open(file, 'w', encoding='utf-8') as json_db:
    json_db.write(json.dumps( db, sort_keys=True, indent=4,
                ensure_ascii=False, separators=(',', ': ')))

它会像这样存储在txt文件中：它失去了格式化，我认为这是因为编码或一些json转储魔法（我现在不想知道）。我认为这样会很好，因为如果我打印'sdfd \ n'它会打印新行而不是'\ n'，但对我来说并非如此。它打印'\ n'并且不尊重任何新行。

这是我用来读取该文件的代码：

f= open("note.txt", 'r')
a = f.read()
type(a)
>>>'str'
b = a.decode('utf-8')
type(b)
>>>'unicode'
print b

格式完全消失，类型为unicode。我尝试使用io open，编解码器，我没有得到unicode ...我做错了什么。我只想要我的新行:(我甚至将sys.stdout设置为codecs.getwriter（'utf-8'）

编辑1：使用json.loads并读取文件...

python unicode读/写不尊重新行字符

0 个答案: