https://docs.google.com/file/d/0B1sEqo7wNB1-TlNEeXh6QldLT2c/edit
我正在尝试删除上述txt中的特殊字符的程序。
我已经有这样的卸妆。
chars = [u'\u001A', u'\u001C', u'\u001D', u'\u001E', u'\u0085'];
input_file = sys.argv[1]
output_file = sys.argv[2]
ifile = codecs.open(input_file, encoding = 'utf-8', mode="rb")
ofile = codecs.open(output_file, encoding = 'utf-8', mode="wb")
for line in ifile:
for ch in chars:
if ch in line:
line = line.replace(ch, '')
ofile.write(line)
ifile.close()
ofile.close()
但它无法删除该txt中的那些字符。相反,它崩溃了。我该怎么办?
答案 0 :(得分:0)
我会尝试:
input_file = sys.argv[1]
output_file = sys.argv[2]
ifile = codecs.open(input_file, encoding = 'utf-8', mode="rb")
ofile = codecs.open(output_file, encoding = 'utf-8', mode="wb")
for line in ifile:
for ch in line:
try:
ofile.write(ch.decode('utf-8')
except UnicodeDecodeError:
pass
ifile.close()
ofile.close()
稍微提示,为了使代码更加pythonic,请使用语句查看。