在一些文化中正则表达式和资本I

时间:2015-04-16 12:04:45

标签: c# regex

在某些文化中,资本“我”有什么问题?我发现在某些文化中无法在特殊条件下找到 - 如果你正在寻找带有标志RegexOptions.IgnoreCase的[a-z]。以下是示例代码:

var allCultures = CultureInfo.GetCultures(CultureTypes.AllCultures);
var allLetters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
var allLettersCount = allLetters.Length;

foreach (var culture in allCultures)
{
    Thread.CurrentThread.CurrentCulture = culture;
    Thread.CurrentThread.CurrentUICulture = culture;

    var matched = string.Empty;
    foreach (var m in Regex.Matches(allLetters, "[A-Za-z0-9]", RegexOptions.IgnoreCase))
        matched += m;

    var count = matched.Length;
    if (count != allLettersCount)
        Console.WriteLine("Culture '{0}' - {1} missing; Matched: {2}", culture.Name, (allLettersCount - count).ToString(), matched);
}

输出是(注意每行中缺少资本I):

Culture 'az' - 1 missing; Matched:          ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Culture 'az-Cyrl' - 1 missing; Matched:     ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Culture 'az-Cyrl-AZ' - 1 missing; Matched:  ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Culture 'az-Latn' - 1 missing; Matched:     ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Culture 'az-Latn-AZ' - 1 missing; Matched:  ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Culture 'tr' - 1 missing; Matched:          ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Culture 'tr-TR' - 1 missing; Matched:       ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789

有趣的是,如果没有使用标志“IgnoreCase”那么它运行良好,并找到“我”。

1 个答案:

答案 0 :(得分:18)

答案在Wikipedia

  

无点和点状I形式的外壳与其他外壳不同   语言。这意味着期望的一个不区分大小写的匹配   一个英国人并不符合土耳其用户的期望。   "土耳其语I"经常被用作案例问题的一个例子   计算中的不敏感。

另一种解释可以在MSDN上找到:

enter image description here