Question

我正在尝试从textfile中读取所有文本。它适用于英语，并且不适用于西班牙语，法语等。我必须从textfile中读取任何语言。我正在使用 File.ReadAlltext（filepath，Encoding.UTF8）。我尝试过UTF-8，Default等。但它无法阅读，我得到一些不需要的字符。请给我一个解决方案来解决这个问题。

Answer 1

您知道文件使用的编码吗？如果没有，那么您可以尝试提到here提到的解决方案。在尝试以编程方式查找编码时，您只能希望获得最佳效果，因为结果总是会带来惊喜，因为有很多可能性。以下是我从该链接中获取的代码。

/// <summary>
/// Determines a text file's encoding by analyzing its byte order mark (BOM).
/// Defaults to ASCII when detection of the text file's endianness fails.
/// </summary>
/// <param name="filename">The text file to analyze.</param>
/// <returns>The detected encoding.</returns>
public static Encoding GetEncoding(string filename)
{
    // Read the BOM
    var bom = new byte[4];
    using (var file = new FileStream(filename, FileMode.Open)) file.Read(bom, 0, 4);

    // Analyze the BOM
    if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
    if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
    if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
    if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
    if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return Encoding.UTF32;
    return Encoding.ASCII;
}

Answer 2

您可以使用此https://code.google.com/p/chardetsharp/库获取文件编码。然后转换为所需的。

从文本文件中读取文本（所有语言）

2 个答案: