我有一个错误:UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 266-266: Non-BMP character not supported in Tk
我正在解析数据,而一些表情符号则属于数组。我需要data = 'this variable contains some emoji'sツ'
:data = 'this variable contains some emoji's'
如何从数据中删除这些字符或在Python 3中处理这种情况?
答案 0 :(得分:3)
如果目标只是删除'\uFFFF'
以上的所有字符,那么直接的方法就是:
data = "this variable contains some emoji'sツ"
data = ''.join(c for c in data if c <= '\uFFFF')
你的字符串可能是分解形式的,所以你可能需要to normalize
it to composed form,所以非BMP字符是可识别的:
import unicodedata
data = ''.join(c for c in unicodedata.normalize('NFC', data) if c <= '\uFFFF')
答案 1 :(得分:-1)
>>> import string
>>> printable = set(string.printable)
>>> filter(lambda x: x in printable, data)
"this variable contains some emoji's"
对于BMP,请阅读:removing emojis from a string in Python