为什么Subversion会给出我的一些UTF-8文本文件内容类型" application / octet-stream"?

时间:2014-06-09 12:05:44

标签: svn encoding

我有一些UTF-8编码的文本文件(日文文本),并将它们添加到Subversion存储库中。

令我惊讶的是,其中一个将自动属性svn:mime-type设置为application/octet-stream,而其他人则没有获得任何特定的编码信息。

这些文件是有效的UTF-8,file报告" UTF-8 Unicode文本,带有CRLF行终止符" 适用于所有文件。

这里发生了什么? Subversion如何决定文件是否应被视为二进制文件?

1 个答案:

答案 0 :(得分:3)

我在svn_io_is_binary_data中的/* Right now, this function is going to be really stupid. It's going to examine the block of data, and make sure that 15% of the bytes are such that their value is in the ranges 0x07-0x0D or 0x20-0x7F, and that none of those bytes is 0x00. If those criteria are not met, we're calling it binary. NOTE: Originally, I intended to target 85% of the bytes being in the specified ranges, but I flubbed the condition. At any rate, folks aren't complaining, so I'm not sure that it's worth adjusting this retroactively now. --cmpilato */ 中找到了解释:

>= 0x80

使用UTF-8的日文文本,大多数代码点将使用三个字节,每个字节为{{1}}。

我的文件中没有更多文件触发此行为的原因是在ASCII范围内使用字符的小前导码。