Question

我们知道类UTF8Encoding的构造函数可以接收一个可选参数：bool指定编码器是否应该提供字节顺序标记（BOM）。

但是，使用两种方法对相同文本进行编码时，输出相同：

string text = "Hello, world!";
byte[] withBom= new UTF8Encoding(true).GetBytes(text);
byte[] withoutBom = new UTF8Encoding(false).GetBytes(text);

withBom和withoutBom都有相同的内容，其中一个字节甚至比另一个字节多一个字节。

为什么会这样？为什么没有将字节顺序标记添加到withBom？

Answer 1

构造函数中的BOM参数不会影响GetBytes的结果，它会影响GetPreamble的结果。用户需要手动附加。

byte[] bom = new UTF8Encoding(true).GetPreamble(); // 3 bytes
byte[] noBom = new UTF8Encoding(false).GetPreamble(); // 0 bytes

Answer 2

UTF8Encoding enc = new UTF8Encoding(true);
byte[] withBom = enc.GetPreamble().Concat(enc.GetBytes(text)).ToArray();