Question

在我的程序中，我将处理一些字符串。这些字符串可以来自任何语言（例如日语，葡萄牙语，普通话，英语等）

有时这些字符串可能包含一些HTML特殊字符，如商标符号（™），注册符号（®），版权符号（©）等。

然后我将生成一张包含这些详细信息的Excel表格。但是当这些是特殊字符时，即使创建了excel文件，它也无法打开，因为它似乎已损坏。
所以我做的是在写入excel之前编码字符串。但接下来发生的事情是，除了英语之外的所有字符串都是编码的。图片显示作为日语文本的资产描述也被转换为编码文本。但我只想编码特殊字符

゜祌りりゅ氧廪，駤びょこここ埣でで被转换为゜祌づ りゅ氧廩, 駤びょ菣 鏥こ埣槎で但我只想编码特殊字符。

所以我需要的是确定字符串是否包含那种特殊字符。因为我正在处理多种语言，是否有任何可能的方法来识别该字符串是否包含HTML特殊字符？

Answer 1

使用Regex.IsMatch方法尝试此操作：

string str = "*!#©™®";
var regx = new Regex("[^a-zA-Z0-9_.]");
if (regx.IsMatch(str))
{
    Console.WriteLine("Special character(s) detected.");
}

See the Demo

Answer 2

尝试Regex.Replace方法：

// Replace letters and numbers with nothing then check if there are any characters left.
// The only characters will be something like $, @, ^, or $.
//
// [\p{L}\p{Nd}]+ checks for words/numbers in any language.
if (!string.IsNullOrWhiteSpace(Regex.Replace(input, @"([\p{L}\p{Nd}]+)", "")))
{
    // Do whatever with the string.
}

Detection demo.

Answer 3

我想你可以从将字符串视为Char数组开始 https://msdn.microsoft.com/en-us/library/system.char(v=vs.110).aspx 然后你可以依次检查每个角色。确实在第二次阅读该手册页时，为什么不使用它：

 string s = "Sometime these strings may contain some HTML special characters like trademark symbol(™), registered symbol(®), Copyright symbol(©) and etc.゜祌づ りゅ氧廩, 駤びょ菣 鏥こ埣槎で";
Char[] ca = s.ToCharArray();
foreach (Char c in ca){
    if (Char.IsSymbol(c))
        Console.WriteLine("found symbol:{0} ",c );
}

在C＃中检测文本中的特殊字符

3 个答案: