如何过滤Unicode字符

时间:2017-05-15 15:42:49

标签: python python-3.x unicode file-manipulation

我有一个包含Unicode字符列表的文件(由于复制粘贴失败),每16个字符也有十六进制代码,例如。

Ս Վ Տ 0550 Ր Ց Ւ Փ Ք Օ Ֆ ՗ ՘ ՙ ՚ ՛ ՜ ՝ ՞ ՟ 0560 ՠ ա բ գ

中间有05500560。我想创建一个删除这些数字的程序,但是当我尝试读取该文件时,它会引发错误:

Traceback (most recent call last):
  File "C:\Users\Millicent\Desktop\a.py", line 1, in <module>
    open('characters.txt').read()
  File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 392: character maps to <undefined>

我目前的代码是

with open('character.txt','r') as file:
    chars = file.read().split()

def isdigit(string):
    try:
        int(string, 16)
        return True
    except:
        return False

chars = list(filter(lambda s: len(s) != 4 and isdigit(s), chars))

with open('characters.txt','w') as file:
    file.write(''.join(chars))

有人能告诉我如何让Python接受特殊字符吗?

0 个答案:

没有答案