Question

当我使用包含ASCII字母和ASCII字符串的列表时，我使此代码正常工作，但我无法使其正常工作。

# -*- coding: utf-8 -*-
asa = ["ā","ē","ī","ō","ū","ǖ","Ā","Ē","Ī","Ō","Ū","Ǖ",
"á","é","í","ó","ú","ǘ","Á","É","Í","Ó","Ú","Ǘ",
"ǎ","ě","ǐ","ǒ","ǔ","ǚ","Ǎ","Ě","Ǐ","Ǒ","Ǔ","Ǚ",
"à","è","ì","ò","ù","ǜ","À","È","Ì","Ò","Ù","Ǜ"]
[x.decode('utf-8') for x in asa]
print list(set(asa) & set("ō"))

Answer 1

你需要将你的角色放在一个列表中，因为字符串是可迭代的对象，你的unicode字符包含2个字节的字符串，因此python假定“ō”为\xc5和\x8d。：

>>> list("ō")
['\xc5', '\x8d']
>>> print list(set(asa) & set(["ō"]))
['\xc5\x8d']
>>> print list(set(asa) & set(["ō"]))[0]
ō

Answer 2

您的第一个集合包含"ō".decode('utf-8')形式的元素（类型unicode），相当于u"ō"。

第二组包含"ō"（类型str）等字节字符串，因此它们不会比较相等而且没有交叉点。

Medidate：

>>> 'a' == u'a'
True
>>> 'ō' == u'ō'
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
>>> list('ō')
['\xc5', '\x8d']
>>> list(u'ō')
[u'\u014d']

utf-8 list和utf-8字符串的python交集

2 个答案: