Question

我在C＃工作，这是我的代码：

        Encoding encoding;
        StringBuilder output = new StringBuilder();

        //somePath is string
        using (StreamReader sr = new StreamReader(somePath))
        {
            string line;
            encoding = sr.CurrentEncoding;
            while ((line = sr.ReadLine()) != null)
            {
                //make some changes to line
                output.AppendLine(line);
            }
        }

        using (StreamWriter writer = new StreamWriter(someOtherPath, false))//encoding
        {
            writer.Write(output);
        }

在somePath上的文件中，我有像å这样的挪威字符。但是，在someOtherPath中的文件中，我得到问号而不是它们。我认为这是一个编码问题，所以我尝试获取输入文件编码并将其授予输出文件。它没有结果。我尝试使用谷歌浏览器打开该文件并为其提供所有可能的编码，但这些字母与输入文件中的字母不同。

Answer 1

StreamReader只能针对某些编码做出猜测。理想情况下，您应该找出文件的编码是什么，然后使用它来读取文件。是什么创建了文件，是什么让你正确阅读？后一个程序是否公开了它使用的编码？（例如，它可能使用类似Windows-CP1252的东西。）

如果可以的话，我会个人建议使用UTF-8作为输出编码，但这取决于你是否能控制读取输出的内容。

编辑：好的，现在我看到了文件，我可以确认它是不是 UTF-8。 “direktør”一词表示为这些字节：

64 69 72 65 6b 74 f8 72

因此非ASCII字符是单字节（F8），不是字符的有效UTF-8表示。

它可以是ISO-Latin-1 - 它不清楚（有多种编码可以匹配）。如果是，您可以使用：

Encoding encoding = Encoding.GetEncoding(28591);

using (TextReader reader = new StreamReader(filename, encoding))
{
    ...
}

（或者，使用File.ReadAllLines让生活更简单。）

您需要单独计算出您想要的输出编码。

编辑：这是一个简短但完整的程序，我针对您提供的文件运行，并且正确地将字符转换为UTF-8：

using System;
using System.IO;
using System.Text;

class Test
{
    static void Main()
    {
        Encoding encoding = Encoding.GetEncoding(28591);
        StringBuilder output = new StringBuilder();
        using (TextReader reader = new StreamReader("file.html", encoding))
        {
            string line;
            while ((line = reader.ReadLine()) != null)
            {
                output.AppendLine("Read line: " + line);
            }
        }
        using (StreamWriter writer = new StreamWriter("output.html", false))
        {
            writer.Write(output);
        }
    }
}

Answer 2

尝试这种情况来保存文字：

using (StreamWriter writer = new StreamWriter(someOtherPath, Encoding.UTF8)) { ... }

输出文件中没有正确显示C＃字母

2 个答案: