我有一个包含Unicode字符列表的文件(由于复制粘贴失败),每16个字符也有十六进制代码,例如。
Ս Վ Տ 0550 Ր Ց Ւ Փ Ք Օ Ֆ ՙ ՚ ՛ ՜ ՝ ՞ ՟ 0560 ՠ ա բ գ
中间有0550
和0560
。我想创建一个删除这些数字的程序,但是当我尝试读取该文件时,它会引发错误:
Traceback (most recent call last):
File "C:\Users\Millicent\Desktop\a.py", line 1, in <module>
open('characters.txt').read()
File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 392: character maps to <undefined>
我目前的代码是
with open('character.txt','r') as file:
chars = file.read().split()
def isdigit(string):
try:
int(string, 16)
return True
except:
return False
chars = list(filter(lambda s: len(s) != 4 and isdigit(s), chars))
with open('characters.txt','w') as file:
file.write(''.join(chars))
有人能告诉我如何让Python接受特殊字符吗?