Question

假设我们有一个像下面这样的字符串。

string s = "此检查项己被你忽略，请联系医生。\u2028内科";

如何删除字符串中的\u2028之类的unicode字符？

我曾尝试过以下功能。不幸的是，他们都没有工作。请救救我感谢。

Unicode characters string

Convert a Unicode string to an escaped ASCII string

Replace unicode escape sequences in a string

更新

为什么以下代码对我不起作用？

更新我试图在输出中显示字符串。这是一个行分隔符。

Answer 1

正如@spender在上述评论中指出的那样：

你的问题（删除unicode）的基本前提被打破了，因为所有字符串都作为unicode存储在内存中。所有字符都是unicode。

但是，如果您有一个非转义字符串，格式为"\uXXXX"，您想要替换/删除，则可以使用此正则表达式模式：{{1 }}

这是一个完整的例子：

@"\\u[0-9A-Fa-f]{4}"

Here's a fiddle要测试，这是它的输出：

注意：由于字符串是硬编码的，因此必须在此处使用string noUnicode = "此检查项己被你忽略，请联系医生。内科"; // If you hard-code the string, you MUST add an `@` before the string, otherwise, // the "u2028" will get escaped and converted to its corresponding Unicode character. string s = @"此检查项己被你忽略，请联系医生。\u2028内科"; string ss = Regex.Replace(s, @"\\u[0-9A-Fa-f]{4}", string.Empty); Debug.Print("s = " + s); Debug.Print("ss = " + ss); Debug.Print((ss == noUnicode).ToString());来防止子字符串@转换为相应的Unicode字符。另一方面，如果从其他地方获取原始字符串（例如，从文本文件中读取），则子字符串"\u2028"已经按表示，应该没有问题，上面的代码应该可以正常工作。

所以，这样的事情将完全相同：

"\u2028"

如何删除字符串中的unicode字符

1 个答案: