C#正则表达式输入字符串问题

时间:2011-07-08 07:37:16

标签: c# regex

我遇到以下程序的问题,它编译但是当我运行它时输入字符串的格式不正确。任何人都可以帮忙。

        string path = @"C:/Documents and Settings/expn261/Desktop/CharacterTest/Output.xls";
        string strCharater = File.ReadAllText(path,UTF7Encoding.UTF7);

        strCharater = Regex.Replace(strCharater, "[èéèëêð]", "e");
        strCharater = Regex.Replace(strCharater, "[ÉÈËÊ]", "E");
        strCharater = Regex.Replace(strCharater, "[àâä]", "a");
        strCharater = Regex.Replace(strCharater, "[ÀÁÂÃÄÅ]", "A");
        strCharater = Regex.Replace(strCharater, "[àáâãäå]", "a");
        strCharater = Regex.Replace(strCharater, "[ÙÚÛÜ]", "U");
        strCharater = Regex.Replace(strCharater, "[ùúûüµ]", "u");
        strCharater = Regex.Replace(strCharater, "[òóôõöø]", "o");
        strCharater = Regex.Replace(strCharater, "[ÒÓÔÕÖØ]", "O");
        strCharater = Regex.Replace(strCharater, "[ìíîï]", "i");
        strCharater = Regex.Replace(strCharater, "[ÌÍÎÏ]", "I");
        strCharater = Regex.Replace(strCharater, "[š]", "s");
        strCharater = Regex.Replace(strCharater, "[Š]", "S");
        strCharater = Regex.Replace(strCharater, "[ñ]", "n");
        strCharater = Regex.Replace(strCharater, "[Ñ]", "N");
        strCharater = Regex.Replace(strCharater, "[ç]", "c");
        strCharater = Regex.Replace(strCharater, "[Ç]", "C");
        strCharater = Regex.Replace(strCharater, "[ÿ]", "y");
        strCharater = Regex.Replace(strCharater, "[Ÿ]", "Y");
        strCharater = Regex.Replace(strCharater, "[ž]", "z");
        strCharater = Regex.Replace(strCharater, "[Ž]", "Z");
        strCharater = Regex.Replace(strCharater, "[Ð]", "D");
        strCharater = Regex.Replace(strCharater, "[œ]", "oe");
        strCharater = Regex.Replace(strCharater, "[Œ]", "Oe");
        strCharater = Regex.Replace(strCharater, "[«»\u201C\u201D\u201E\u201F\u2033\u2036]", "\"");
        strCharater = Regex.Replace(strCharater, "[\u2026]", "...");

        string path2 = (@"C:/Documents and Settings/expn261/My Documents/CharacterReplaceTest.csv");
        StreamWriter sw = new StreamWriter(path2);
        sw.WriteLine(strCharater, UTF7Encoding.UTF7);

3 个答案:

答案 0 :(得分:3)

这不是众所周知,但工作就像一个魅力。删除所有变音符号。

// using System.Globalization
public static string RemoveDiacritics(string s) {
    s = s.Normalize(NormalizationForm.FormD);
    StringBuilder sb = new StringBuilder();

    for (int i = 0; i < s.Length; i++) {
        if (CharUnicodeInfo.GetUnicodeCategory(s[i]) != UnicodeCategory.NonSpacingMark) sb.Append(s[i]);
    }

    return sb.ToString();
}

答案 1 :(得分:2)

看起来你要做的是翻译字符串中的字符。这是您可能实际上只想写一个大的switch语句的情况之一:

var sb = new StringBuilder();
foreach (char c in strCharater) // could you choose a better name than strCharater?
{
    switch (c)
    {
       case 'è':
       case 'é':
          sb.Append('e');
          break;
       case 'ä':
       case 'à':
          break;
       default:
          sb.Add(c);
          break;
    }
}
strCharater = sb.ToString();

这种方法的好处是不会创建必须分配和垃圾收集的大量(不可变)字符串。此外,JIT应该将其编译为非常快的代码!

答案 2 :(得分:1)

当发生异常时,编译器会创建一个名为stack trace的包,它是发生异常的所有位置的地址,返回到导致该异常的第一个方法调用链。查看此问题存在于哪一行,并尝试仅关注该行,而不是查看整个块。 :)