使用locale / collat​​ion按键对字典排序

时间:2014-03-05 16:31:17

标签: python python-2.7

以下代码忽略了语言环境,Égypt最后会出现什么问题?

dict = {"United States": "United States", "Spain" : "Spain", "England": "England", "Égypt": "Égypt"}

import locale

# using your default locale (user settings)
locale.setlocale(locale.LC_ALL,"fr_FR")
print OrderedDict(sorted(dict.items(), key=lambda t: t[0], cmp=locale.strcoll))

这是输出:

OrderedDict([('England', 'England'), ('Spain', 'Spain'), ('United States', 'United States'), ('\xc3\x89gypt', '\xc3\x89gypt')])

2 个答案:

答案 0 :(得分:1)

考虑以下内容......

import unicodedata
from collections import OrderedDict
dict = {"United States": "United States", "Spain" : "Spain", "England": "England", "Égypt": "Égypt"}

import locale

# using your default locale (user settings)
locale.setlocale(locale.LC_ALL,"fr_FR")

print OrderedDict(sorted(dict.items(),cmp= lambda a,b: locale.strcoll(unicodedata.normalize('NFD', unicode(a)[0]).encode('ASCII', 'ignore'),
                                                                       unicodedata.normalize('NFD', unicode(b)[0]).encode('ASCII', 'ignore'))))

答案 1 :(得分:0)

这是一个解决方法。

使用unicode规范化形式规范分解http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms

# utf-8 <-> unicode is left as exercise to the reader
egypt = unicodedata.normalize("NFD", egypt)

sorted(['Egypt', 'E\xcc\x81gypt', 'US'])
['Egypt', 'E\xcc\x81gypt', 'US']

这实际上并未考虑区域设置。

除此之外,请尝试更新的Python(是的,我知道)或来自Martijn链接问题和相应答案的ICU库。