python regex的UnicodeDecodeError

时间:2016-05-29 13:43:22

标签: python unicode

我正在尝试用空格替换所有标签,这样我就可以将我的Coma分隔文本放在另一个文件的一行中。现在我的代码看起来像这样:

from __future__ import print_function
import re
import ast

f = open('sample_test.txt', 'r')
g = open('sample_test1.txt', 'w')

for line in f:
        c = re.sub(r'\R', r' ', line.rstrip())
        print (c, file = g)
f.close()

现在的问题是我收到了这个错误:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 1944: character maps to <undefined>

1 个答案:

答案 0 :(得分:0)

utf-8 打开文件,如果您只想更换标签,也不需要正则表达式:

import io

with io.open('sample_test.txt', encoding="utf-8") as f, io.open('sample_test1.txt', 'w', encoding="utf-8") as g: 
    for line in f:
        g.write(line.replace("\t"," "))