Question

我正在编写一个bash脚本来自动处理一些文件，一个subjob是使用var child_process = require( 'child_process' ) , home = process.env.HOME ; if( /cygwin/.test( home ) ) { var child_process_exec = child_process.exec , bash = home.replace( /(cygwin[0-9]*).*/, "$1" ) + "\\bin\\bash.exe" ; child_process.exec = function( command, options, next ) { command = bash + " -c '" + command.replace( /\\/g, '/' ).replace( /'/g, "\'" ) + "'"; child_process_exec( command, options, next ) } }来重新编码源文件，如果它们不是我喜欢的话。为此，我使用：

for index in weapons.indices
{
   weapons[index].fire()
}

问题是我有一个UTF8文件但iconv错误地将其识别为ASCII。第一个非ASCII字符是字符＃314206，它在第＃1028行。显然，enc=$(file -b --mime-encoding "$file") # get the encoding if [ "$enc" = "iso-8859-1" ] || [ "$enc" = "us-ascii" ] # no need to encode these then unset enc fi cat "$file" | # conditional encoding below ( [[ "${enc}" ]] && iconv -f "$enc" -t iso-8859-1 || cat ) | awk '{# code to process file further}' > "$newfile"有一些样本大小，例如，如果我将文件从固定宽度转换为字符分隔，则第一个非ASCII字符为char＃80872，file正确识别文件编码。所以我猜有一个样本大小介于这两个值之间。

（的 TL; DR ）有没有办法指示file采取更大的样本或阅读整个源文件，或其他一些bash友好的方式来找出编码？

我和file一起玩，但不能影响结果。 file没有帮助我任何进一步的谷歌文件命令样本大小不是很有希望。

（如果您对条件方法感到疑惑，还有其他一些要处理的任务也未在代码示例中显示）

Answer 1

默认情况下，file只会分析文件的第一个1048576个字节。

在提交d04de269中添加了控制此限制的选项，自version 5.26（2016-04-16）以来file中提供了此选项。它由-P选项控制，参数名为bytes：

-P, --parameter name=value
    Set various parameter limits.
        Name         Default    Explanation
        ...
        bytes        1048576    max number of bytes to read from file

因此，您只需将bytes限制设置为最大文件的大小，例如： 100 MB：

$ file -P bytes=104857600 file

Bash和文件命令样本大小

1 个答案: