我没有找到关于如何在 windows 上处理此问题的良好说明,所以我在这里做。
土耳其语ı
(I
)和i
(İ
)中有两个字母由python错误处理。
>>> [char for char in 'Mayıs']
['M', 'a', 'y', 'i', 's']
>>> 'ı'.upper().lower()
'i'
如果区域设置正确,应该如何:
>>> [char for char in 'Mayıs']
['M', 'a', 'y', 'ı', 's']
>>> 'ı'.upper().lower()
'ı'
和
>>> 'i'.upper()
'İ'
>>> 'ı'.upper()
'I'
我尝试了locale.setlocale(locale.LC_ALL,'Turkish_Turkey.1254')
甚至'ı'.encode('cp857')
,但没有帮助。
如何让python正确处理这两个字母?
答案 0 :(得分:7)
您应该使用PyICU
>>> from icu import UnicodeString, Locale
>>> tr = Locale("TR")
>>> s = UnicodeString("i")
>>> print(unicode(s.toUpper(tr)))
İ
>>> s = UnicodeString("I")
>>> print(unicode(s.toLower(tr)))
ı
>>>
答案 1 :(得分:2)
您可以为土耳其字符问题定义自己的硬编码功能。
import re
def tr_upper(self):
self = re.sub(r"i", "İ", self)
self = re.sub(r"ı", "I", self)
self = re.sub(r"ç", "Ç", self)
self = re.sub(r"ş", "Ş", self)
self = re.sub(r"ü", "Ü", self)
self = re.sub(r"ğ", "Ğ", self)
self = self.upper() # for the rest use default upper
return self
def tr_lower(self):
self = re.sub(r"İ", "i", self)
self = re.sub(r"I", "ı", self)
self = re.sub(r"Ç", "ç", self)
self = re.sub(r"Ş", "ş", self)
self = re.sub(r"Ü", "ü", self)
self = re.sub(r"Ğ", "ğ", self)
self = self.lower() # for the rest use default lower
return self
常规鞋帮:
>>>print("ulvido".upper())
ULVIDO
我们的自定义鞋帮:
>>>print(tr_upper("ulvido"))
ULVİDO
如果您需要进行大量转换,可以将其设为.py文件。例如:将其另存为trtextstyle.py并导入到项目中。
如果trtextstyle.py与您的文件位于同一目录:
from .trtextstyle import tr_upper, tr_lower
希望这会有所帮助。
答案 2 :(得分:0)
def tr_capitalize(param_word):
word_list = param_word.split(sep=" ")
new_word = ""
for word in word_list:
first_letter = word[0]
last_part = word[1:]
first_letter = re.sub(r"i", "İ", first_letter)
first_letter = re.sub(r"ı", "I", first_letter)
first_letter = re.sub(r"ç", "Ç", first_letter)
first_letter = re.sub(r"ş", "Ş", first_letter)
first_letter = re.sub(r"ü", "Ü", first_letter)
first_letter = re.sub(r"ğ", "Ğ", first_letter)
last_part = re.sub(r"İ", "i", last_part)
last_part = re.sub(r"I", "ı", last_part)
last_part = re.sub(r"Ç", "ç", last_part)
last_part = re.sub(r"Ş", "ş", last_part)
last_part = re.sub(r"Ü", "ü", last_part)
last_part = re.sub(r"Ğ", "ğ", last_part)
rebuilt_word = first_letter + last_part
rebuilt_word = rebuilt_word.capitalize()
new_word = new_word + " " + rebuilt_word
new_word = new_word.strip()
return new_word