Question

我有一个asp.net c＃页面，我正在尝试读取具有以下字符的文件并将其转换为'。（从倾斜的撇号到撇号）。

FileInfo fileinfo = new FileInfo(FileLocation);
string content = File.ReadAllText(fileinfo.FullName);

//strip out bad characters
content = content.Replace("’", "'");

这不起作用，它会将倾斜的撇号变为？标记。

Answer 1

我怀疑问题不在于替换，而在于读取文件本身。当我尝试这种方式（使用Word和复制粘贴）时，我得到了与您相同的结果，但是检查content表明.Net框架认为该字符是Unicode字符65533 ，即“WTF？”字符之前字符串替换。您可以通过检查Visual Studio调试器中的相关字符来自行检查，它应显示字符代码：

content[0]; // 65533 '�'

替换不起作用的原因很简单 - content不包含您提供的字符串：

content.IndexOf("’"); // -1

至于为什么文件读取不正常 - 您在读取文件时可能使用了错误的编码。（如果没有指定编码，则.Net框架将尝试为您确定正确的编码，但是没有100％可靠的方法来执行此操作，因此通常可能会出错）。您需要的确切编码取决于文件本身，但在我的情况下，使用的编码是Extended ASCII，因此要读取我只需要指定正确编码的文件：

string content = File.ReadAllText(fileinfo.FullName, Encoding.GetEncoding("iso-8859-1"));

（见this question）。

您还需要确保在替换字符串中指定正确的字符 - 在代码中使用“奇数”字符时，您可能会发现通过字符代码指定字符更可靠，而不是字符串文字（如果源文件的编码发生变化，可能会导致问题），例如以下内容对我有用：

content = content.Replace("\u0092", "'");

Answer 2

// This should replace smart single quotes with a straight single quote

Regex.Replace(content, @"(\u2018|\u2019)", "'");

//However the better approach seems to be to read the page with the proper encoding and leave the quotes alone
var sreader= new StreamReader(fileInfo.Create(), Encoding.GetEncoding(1252));

Answer 3

我敢打赌，文件是以Windows-1252编码的。这几乎与ISO 8859-1相同。区别在于Windows-1252使用“可显示的字符而不是0x80到0x9F范围内的控制字符”。（这是倾斜撇号的位置，即0x92）

//Specify Windows-1252 here
string content = File.ReadAllText(fileinfo.FullName, Encoding.GetEncoding(1252));
//Your replace code will then work as is
content = content.Replace("’", "'");

Answer 4

如果你使用String（大写）而不是字符串，它应该能够处理你抛出的任何Unicode。首先尝试，看看是否有效。

使用unicode字符读取文件

4 个答案: