C#UTF-8字符编码输出 - 变量不同于字符串

时间:2013-07-26 08:54:05

标签: c# .net encoding utf-8

这是一个简单的测试案例,我觉得我缺少一些基本的东西,但任何帮助都会受到赞赏!

string data = @"Well done UK building industry, Olympics \u00a3377m under budget + boost";
foreach (Match m in Regex.Matches(data, @"\\u(\w*)\b"))
{
    Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
    string match = m.Value;
    // These should output the exact same thing however the first is a £ and the other is \u00a3377m
    Console.WriteLine("\u00a3377m" + "      " + match);
}

3 个答案:

答案 0 :(得分:0)

您忘了逃避手动打印的字符串。因此,特殊的角色' \ u00a3377m'直接解决。

以下工作符合要求:

// These should output the exact same thing however the first is a £ and the other is \u00a3377m
            Console.WriteLine("\\u00a3377m" + "      " + match);

另一种选择是使用@:

Console.WriteLine(@"\u00a3377m" + "      " + match);

答案 1 :(得分:0)

00A3£字符的unicode。看看http://unicode-table.com/en/#00A3

因此,当您尝试撰写“\u00a3377m"时,regular string literal将为£377m

使用 verbtaim string literal 代替;

Console.WriteLine(@"\u00a3377m" + "      " + match);
  

我完全忘了添加我真正想要的问题   £sign

char c = '\u00a3';
string s = c.ToString(); // s will be £

答案 2 :(得分:0)

我很感激帮助,但是我错过了一些关键信息,这是我的错。

我实际上希望输出为“££”而不是“£\ u00a3377m”。

为此,我最终使用Replace unicode escape sequences in a string的答案,即使用以下功能:

private static Regex _regex = new Regex(@"\\u(?<Value>[a-zA-Z0-9]{4})", RegexOptions.Compiled);
public string Decoder(string value)
{
    return _regex.Replace(
        value,
        m => ((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString()
    );
}

然后像这样使用它:

string data = @"Well done UK building industry, Olympics \u00a3377m under budget + boost";
foreach (Match m in Regex.Matches(data, @"\\u(\w*)\b"))
{
    Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
    string match = m.Value;
    //Decode the string so we no longer have \u values
    match = Decoder(match);
    // These should output the exact same thing however the first is a £ and the other is \u00a3377m
    Console.WriteLine("\u00a3377m" + "      " + match);
}