EDITED

Question

我正在从.csv文件中大量上传信息，我需要将此字符替换为ascii“ï¿½”以获取正常空间“”。

字符“ï¿½”对应于C / C ++ / JAVA的“\ uFFFD”，它似乎被称为REPLACEMENT CHARACTER。其他如C＃官方文档中的空格类型如U + FEFF，205F，200B，180E，202F。

我正在尝试以这种方式替换

public string Errors="";

public void test(){

    string textFromCsvCell= "";
    string validCharacters="^[0-9A-Za-z().:%-/ ]+$";
    textFromCsvCell="This is my text from csv file"; //ALl spaces aren't normal space " "
    string cleaned = textFromCsvCell.Replace("\uFFFD", "\"")     
      if (Regex.IsMatch(cleaned, validCharacters ))
        //All code for insert
      else
         Errors=cleaned;
         //print Errors
}

测试方法向我展示了这个文字：

“这是来自csv文件的myï¿½texto”

我尝试了一些解决方案

尝试解决方案1：使用修剪

 Regex.Replace(value.Trim(), @"[^\S\r\n]+", " ");

尝试解决方案2：使用替换

  System.Text.RegularExpressions.Regex.Replace(str,@"\s+"," ");

尝试解决方案3：使用修剪

  String.Trim(new char[]{'\uFEFF','\u200B'});

尝试解决方案4：将[\ S \ r \ n]添加到validCharacters

  string validCharacters="^[\S\r\n0-9A-Za-z().:%-/ ]+$";

任何不起作用

有人有想法吗？我怎样才能更换它？我非常感谢你的帮助，谢谢

来源：

http://www.fileformat.info/info/unicode/char/0fffd/index.htm

Trying to replace all white space with a single space

Strip Byte Order Mark from string in C#

C# Regex - Remove extra whitespaces but keep new lines

EDITED

这是原始字符串：

“监测葡萄糖的持续性系统”

在0x ...表示法

系统OF0xA0MONITORING继续葡萄糖

解决方案

转到此处，Unicode代码转换器：http://r12a.github.io/apps/conversion/ 查看转换并执行替换

就我而言，我做了一个简单的替换：

 string value = "SYSTEM OF MONITORING CONTINUES OF GLUCOSE"; 
 //value containt non-breaking whitespace
 //value is "SYSTEM OFï¿½MONITORING CONTINUES OF GLUCOSE"
 string cleaned = "";
 string pattern = @"[^\u0000-\u007F]+";
 string replacement = " ";

 Regex rgx = new Regex(pattern);
 cleaned = rgx.Replace(value, replacement);

 if (Regex.IsMatch(cleaned,"^[0-9A-Za-z().:<>%-/ ]+$"){
    //all code for insert
 else
    //Errors message

此表达式表示所有可能的空格：空格，制表符，分页符，换行符和回车符

[ \f\n\r\t\v\u00a0\u1680\u180e\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]

参考 https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions

Answer 1

使用String.Replace：

一个简单的String.Replace()怎么样？

我认为您要删除的唯一字符是您在问题中提及的字符：ï¿½，并且您希望将其替换为普通空格。

string text = "impï¿½ortant";
string cleaned = text.Replace('\u00ef', ' ')
        .Replace('\u00bf', ' ')
        .Replace('\u00bd', ' ');
// Returns 'imp   ortant'

或使用Regex.Replace：

string cleaned = Regex.Replace(text, "[\u00ef\u00bf\u00bd]", " ");
// Returns 'imp   ortant'

尝试一下：Dotnet Fiddle

Answer 2

定义一系列Ascii字符，并替换不在该范围内的任何内容。

我们只想找到Unicode字符，因此我们将匹配Unicode字符并替换。

Regex.Replace("This is my te\uFFFDxt from csv file", @"[^\u0000-\u007F]+", " ")

上述模式将匹配此范围^的集[ ]中不 \u0000-\u007F的任何内容（ASCII字符（过去的所有内容均为Unicode）））并用空格替换它。

结果

This is my te xt from csv file

您可以根据需要调整\u0000-\u007F提供的范围，以扩展允许的字符范围以满足您的需求。

用空格替换Unicode字符“ï¿½”

EDITED

解决方案

2 个答案: