Python写文件错误的字符串

时间:2018-11-02 12:15:58

标签: python file utf-8

我正在尝试读取xml文件,替换一些文本,然后覆盖该文件。

输入文件:

<text>A&Z</text>

来源:

with open(file, 'rb') as f:
    newText=f.read().decode('utf-8', 'ignore')
    newText= newText.replace("&","and")
with open(relative_file_path+'/'+fileP, "wb") as f:
    print('nt',newText)
    f.write(newText.encode('utf-8'))

打印nt:

nt <�t�e�x�t�>�A�and�Z�<�/�t�e�x�t�>�

当打印nt时,除了and以外的每个字符之间都有一个NULL字符。

enter image description here

输出文件:

<text>A湡dZ</text>

我使用解码('utf-8','ignore'),因为我的xml中包含无效的起始字符,并且需要读取文件。


已解决

感谢大家的帮助。

def stripped(stripstring):
    mpa = dict.fromkeys(range(32))
    stripstring =  stripstring.translate(mpa)
    return stripstring

with open(relative_file_path+'/'+fileP, mode='rb') as f:
    newText=f.read().decode('utf-8-sig', 'ignore')
    newText = stripped(newText)
    newText= newText.replace("&","and")

with open(relative_file_path+'/'+fileP, "w") as f:
    f.write(newText)

0 个答案:

没有答案