出现奇怪的特殊字符

时间:2014-04-16 08:24:55

标签: python special-characters

https://docs.google.com/file/d/0B1sEqo7wNB1-TlNEeXh6QldLT2c/edit

我正在尝试删除上述txt中的特殊字符的程序。

我已经有这样的卸妆。

chars = [u'\u001A', u'\u001C', u'\u001D', u'\u001E', u'\u0085'];

input_file = sys.argv[1]
output_file = sys.argv[2]

ifile = codecs.open(input_file, encoding = 'utf-8', mode="rb")
ofile = codecs.open(output_file, encoding = 'utf-8', mode="wb")

for line in ifile:
    for ch in chars:
        if ch in line:
            line = line.replace(ch, '')     
    ofile.write(line)

ifile.close()
ofile.close()

但它无法删除该txt中的那些字符。相反,它崩溃了。我该怎么办?

1 个答案:

答案 0 :(得分:0)

我会尝试:

input_file = sys.argv[1]
output_file = sys.argv[2]

ifile = codecs.open(input_file, encoding = 'utf-8', mode="rb")
ofile = codecs.open(output_file, encoding = 'utf-8', mode="wb")

for line in ifile:
    for ch in line:
        try:
            ofile.write(ch.decode('utf-8')
        except UnicodeDecodeError:
            pass     

ifile.close()
ofile.close()

稍微提示,为了使代码更加pythonic,请使用语句查看