BOM字符列表

时间:2018-12-19 19:20:03

标签: csv hex utf byte-order-mark

是否有可能使用的BOM表字符列表?到目前为止,我已经遇到:

\x00\x00\xfe\xff    UTF-32, big-endian
\xff\xfe\x00\x00    UTF-32, little-endian
\xfe\xff            UTF-16, big-endian
\xff\xfe            UTF-16, little-endian
\xef\xbb\xbf        UTF-8

还有其他我想念的东西吗?

1 个答案:

答案 0 :(得分:2)

简短的回答:不,您已经覆盖了它们。

根据Unicode规范,UTF-8,UTF-16和UTF-32是3种常规编码类型。他们实际上将UTF-16,UTF-16LE和UTF-16BE列为单独的编码,并且类似地将UTF-32,UTF-32LE和UTF-32BE列出。

重要的是要知道,如果字符流以LE或BE形式之一显式编码,则必须将前导0xFFFE解释为U + FEFF零宽度无中断空间。即

UTF-16BE  initial FE FF is treated as U+FEFF
UTF-16LE  initial FF FE is treated as U+FEFF
UTF-32BE  initial 00 00 FE FF is treated as U+FEFF
UTF-32LE  initial FF FE 00 00 is treated as U+FEFF

有关更多详细信息,请参见http://www.unicode.org/versions/Unicode11.0.0/ch03.pdf#G2212