Question

最近在忽略案例时阅读了案例折叠和字符串比较。我已经读过，MSDN标准是使用InvariantCulture，绝对避免使用toLowercase。然而，我所读到的案例翻译就像是一个更具攻击性的低级版。我的问题是我应该在Python中使用casefold还是使用更多的pythonic标准？案件折叠是否通过了土耳其测试？

Answer 1

1）在Python 3中，casefold()应该用于实现无壳字符串匹配。

从Python 3.0开始，字符串存储为Unicode。 The Unicode Standard Chapter 3.13定义默认无外壳匹配，如下所示：

字符串X是字符串Y的无壳匹配，当且仅当：
toCasefold（X）= toCasefold（Y）

Python's casefold() implements the Unicode's toCasefold().因此它应该用于实现无壳字符串匹配。虽然，单独的案例折叠不足以涵盖一些角落案件并通过土耳其测试（见第3点）。

2）从Python 3.6开始，casefold（）无法通过土耳其测试。

对于两个字符，大写字母I和点缀大写字母I，the Unicode Standard defines two different casefolding mappings.

默认（对于非突厥语言）：
我→我（U + 0049→U + 0069）
©→i̇（U + 0130→U + 0069 U + 0307）

替代方案（针对突厥语言）：
我→ı（U + 0049→U + 0131）
©→i（U + 0130→U + 0069）

Pythons casefold()只能应用默认映射而无法通过土耳其测试。例如，土耳其语单词“LİMANI”和“limanı”是无用的等价物，但"LİMANI".casefold() == "limanı".casefold()返回False。没有选项可以启用替代映射。

3）如何在Python 3中进行无壳字符串匹配。

The Unicode Standard Chapter 3.13描述了几种无壳匹配算法。 规范的无缝匹配可能适合大多数用例。该算法已经考虑了所有极端情况。我们只需要添加一个选项来在非突厥和突厥的案例折叠之间切换。

import unicodedata

def normalize_NFD(string):
    return unicodedata.normalize('NFD', string)

def casefold_(string, include_special_i=False):
    if include_special_i:
        string = unicodedata.normalize('NFC', string)
        string = string.replace('\u0049', '\u0131')
        string = string.replace('\u0130', '\u0069')
    return string.casefold()

def casefold_NFD(string, include_special_i=False):
    return normalize_NFD(casefold_(normalize_NFD(string), include_special_i))

def caseless_match(string1, string2, include_special_i=False):
    return  casefold_NFD(string1, include_special_i) == casefold_NFD(string2, include_special_i)

casefold_()是Python casefold()的包装器。如果其参数include_special_i设置为True，则它应用Turkic映射，如果设置为False，则使用默认映射。

caseless_match()执行string1和string2的规范无格式匹配。如果字符串是突厥语单词，则include_special_i参数必须设置为True。

<强>示例：

caseless_match('LİMANI', 'limanı', include_special_i=True) 真

caseless_match('LİMANI', 'limanı') 假

caseless_match('INTENSIVE', 'intensive', include_special_i=True) 假

caseless_match('INTENSIVE', 'intensive') 真

我应该使用Python casefold吗？

1 个答案: