Python和土耳其语大写

时间:2013-10-31 09:49:20

标签: python locale turkish

我没有找到关于如何在 windows 上处理此问题的良好说明,所以我在这里做。

土耳其语ıI)和iİ)中有两个字母由python错误处理。

>>> [char for char in 'Mayıs']
['M', 'a', 'y', 'i', 's']

>>> 'ı'.upper().lower()
'i'

如果区域设置正确,应该如何:

>>> [char for char in 'Mayıs']
['M', 'a', 'y', 'ı', 's']

>>> 'ı'.upper().lower()
'ı'

>>> 'i'.upper()
'İ'

>>> 'ı'.upper()
'I'

我尝试了locale.setlocale(locale.LC_ALL,'Turkish_Turkey.1254')甚至'ı'.encode('cp857'),但没有帮助。

如何让python正确处理这两个字母?

3 个答案:

答案 0 :(得分:7)

您应该使用PyICU

>>> from icu import UnicodeString, Locale
>>> tr = Locale("TR")
>>> s = UnicodeString("i")
>>> print(unicode(s.toUpper(tr)))
İ
>>> s = UnicodeString("I")
>>> print(unicode(s.toLower(tr)))
ı
>>>

答案 1 :(得分:2)

您可以为土耳其字符问题定义自己的硬编码功能。

import re

def tr_upper(self):
    self = re.sub(r"i", "İ", self)
    self = re.sub(r"ı", "I", self)
    self = re.sub(r"ç", "Ç", self)
    self = re.sub(r"ş", "Ş", self)
    self = re.sub(r"ü", "Ü", self)
    self = re.sub(r"ğ", "Ğ", self)
    self = self.upper() # for the rest use default upper
    return self


def tr_lower(self):
    self = re.sub(r"İ", "i", self)
    self = re.sub(r"I", "ı", self)
    self = re.sub(r"Ç", "ç", self)
    self = re.sub(r"Ş", "ş", self)
    self = re.sub(r"Ü", "ü", self)
    self = re.sub(r"Ğ", "ğ", self)
    self = self.lower() # for the rest use default lower
    return self

常规鞋帮:

>>>print("ulvido".upper())
ULVIDO

我们的自定义鞋帮:

>>>print(tr_upper("ulvido"))
ULVİDO

如果您需要进行大量转换,可以将其设为.py文件。例如:将其另存为trtextstyle.py并导入到项目中。

如果trtextstyle.py与您的文件位于同一目录:

from .trtextstyle import tr_upper, tr_lower
希望这会有所帮助。

答案 2 :(得分:0)

def tr_capitalize(param_word):
    word_list = param_word.split(sep=" ")
    new_word = ""
    for word in word_list:
        first_letter = word[0]
        last_part = word[1:]

        first_letter = re.sub(r"i", "İ", first_letter)
        first_letter = re.sub(r"ı", "I", first_letter)
        first_letter = re.sub(r"ç", "Ç", first_letter)
        first_letter = re.sub(r"ş", "Ş", first_letter)
        first_letter = re.sub(r"ü", "Ü", first_letter)
        first_letter = re.sub(r"ğ", "Ğ", first_letter)



        last_part = re.sub(r"İ", "i", last_part)
        last_part = re.sub(r"I", "ı", last_part)
        last_part = re.sub(r"Ç", "ç", last_part)
        last_part = re.sub(r"Ş", "ş", last_part)
        last_part = re.sub(r"Ü", "ü", last_part)
        last_part = re.sub(r"Ğ", "ğ", last_part)


        rebuilt_word = first_letter + last_part
        rebuilt_word = rebuilt_word.capitalize()
        new_word = new_word + " " + rebuilt_word

        
    new_word = new_word.strip()
    return new_word