在python中使用重音字符对字符串进行排序

时间:2010-12-10 12:56:20

标签: python sorting collation diacritics

  

可能重复:
  Python not sorting unicode properly. Strcoll doesn't help.

我正在尝试按字母顺序对某些单词进行排序。我是这样做的:

#!/opt/local/bin/python2.7
# -*- coding: utf-8 -*-

import locale

# Make sure the locale is in french
locale.setlocale(locale.LC_ALL, "fr_FR.UTF-8")
print "locale: " + str(locale.getlocale())

# The words are in alphabetical order
words = ["liche", "lichée", "lichen", "lichénoïde", "licher", "lichoter"]

for word in sorted(words, cmp=locale.strcoll):
    print word.decode("string-escape")

我希望这些单词按照定义的顺序打印,但这是我得到的:

locale: ('fr_FR', 'UTF8')
liche
lichen
licher
lichoter
lichée
lichénoïde

é字符被视为大于 z

我似乎误解了locale.strcoll如何比较字符串。我应该使用什么比较器函数来按字母顺序排序?

2 个答案:

答案 0 :(得分:2)

我最终选择strip diacritics并比较字符串的剥离版本,以便我不必添加PyICU依赖项。

答案 1 :(得分:1)