我有一些UTF-8编码的文本文件(日文文本),并将它们添加到Subversion存储库中。
令我惊讶的是,其中一个将自动属性svn:mime-type
设置为application/octet-stream
,而其他人则没有获得任何特定的编码信息。
这些文件是有效的UTF-8,file
报告" UTF-8 Unicode文本,带有CRLF行终止符" 适用于所有文件。
这里发生了什么? Subversion如何决定文件是否应被视为二进制文件?
答案 0 :(得分:3)
我在svn_io_is_binary_data
中的/* Right now, this function is going to be really stupid. It's
going to examine the block of data, and make sure that 15%
of the bytes are such that their value is in the ranges 0x07-0x0D
or 0x20-0x7F, and that none of those bytes is 0x00. If those
criteria are not met, we're calling it binary.
NOTE: Originally, I intended to target 85% of the bytes being in
the specified ranges, but I flubbed the condition. At any rate,
folks aren't complaining, so I'm not sure that it's worth
adjusting this retroactively now. --cmpilato */
中找到了解释:
>= 0x80
使用UTF-8的日文文本,大多数代码点将使用三个字节,每个字节为{{1}}。
我的文件中没有更多文件触发此行为的原因是在ASCII范围内使用字符的小前导码。