从File.ReadAllBytes(byte [])中删除字节顺序标记

时间:2008-11-13 20:12:23

标签: c# byte-order-mark

我有一个HTTPHandler,它读取一组CSS文件并将它们组合起来然后GZipping它们。但是,一些CSS文件包含一个字节顺序标记(由于TFS 2005自动合并中的一个错误),而在FireFox中,BOM被作为实际内容的一部分被读取,所以它搞砸了我的类名等。我怎么能剥离出BOM字符?有没有一种简单的方法可以在没有手动浏览字节数组的情况下查找“”?

5 个答案:

答案 0 :(得分:8)

使用示例扩展Jon's comment

var name = GetFileName();
var bytes = System.IO.File.ReadAllBytes(name);
System.IO.File.WriteAllBytes(name, bytes.Skip(3).ToArray());

答案 1 :(得分:6)

扩展JaredPar示例以递归子目录:

using System.Linq;
using System.IO;
namespace BomRemover
{
    /// <summary>
    /// Remove UTF-8 BOM (EF BB BF) of all *.php files in current & sub-directories.
    /// </summary>
    class Program
    {
        private static void removeBoms(string filePattern, string directory)
        {
            foreach (string filename in Directory.GetFiles(directory, file  Pattern))
            {
                var bytes = System.IO.File.ReadAllBytes(filename);
                if(bytes.Length > 2 && bytes[0] == 0xEF && bytes[1] == 0xBB && bytes[2] == 0xBF)
                {
                    System.IO.File.WriteAllBytes(filename, bytes.Skip(3).ToArray()); 
                }
            }
            foreach (string subDirectory in Directory.GetDirectories(directory))
            {
                removeBoms(filePattern, subDirectory);
            }
        }
        static void Main(string[] args)
        {
            string filePattern = "*.php";
            string startDirectory = Directory.GetCurrentDirectory();
            removeBoms(filePattern, startDirectory);            
        }       
    }
}

在您尝试执行基本的PHP下载文件时发现UTF-8 BOM损坏文件后,我需要使用C#代码。

答案 2 :(得分:3)

var text = File.ReadAllText(args.SourceFileName);
var streamWriter = new StreamWriter(args.DestFileName, args.Append, new UTF8Encoding(false));
streamWriter.Write(text);
streamWriter.Close();

答案 3 :(得分:1)

另一种方式,假设UTF-8为ASCII。

File.WriteAllText(filename, File.ReadAllText(filename, Encoding.UTF8), Encoding.ASCII);

答案 4 :(得分:0)

对于较大的文件,请使用以下代码;记忆效率高!

StreamReader sr = new StreamReader(path: @"<Input_file_full_path_with_byte_order_mark>", 
                    detectEncodingFromByteOrderMarks: true);

StreamWriter sw = new StreamWriter(path: @"<Output_file_without_byte_order_mark>", 
                    append: false, 
                    encoding: new UnicodeEncoding(bigEndian: false, byteOrderMark: false));

var lineNumber = 0;
while (!sr.EndOfStream)
{
    sw.WriteLine(sr.ReadLine());
    lineNumber += 1;
    if (lineNumber % 100000 == 0)
        Console.Write("\rLine# " + lineNumber.ToString("000000000000"));
}

sw.Flush();
sw.Close();