当遇到复合unicode而不是内置unicode时,我在Python中遇到问题。这是复制代码:
# encoding=utf8
a = ["Địa"]
b = ["Địa"]
print(a) # ['\xc4\x90i\xcc\xa3a']
print(b) # ['\xc4\x90\xe1\xbb\x8ba']
print("Địa" in a) # False
print("Địa" in b) # True
如何将它们转换/归一化为同一编码器?
答案 0 :(得分:1)
您可以使用unicodedata.normalize()
:
# encoding=utf8
import unicodedata
a = ["Địa"]
b = ["Địa"]
print("Địa" in [unicodedata.normalize('NFC', i) for i in a])
print("Địa" in [unicodedata.normalize('NFC', i) for i in b])
这将输出:
True
True