Question

我正在尝试将perl脚本转换为PowerShell脚本。当脚本正在读取日志文件并且必须获取文件的编码时，我遇到了部分问题。

以下是perl代码：

sub get_encoding {
my $f = shift;
my $fh;
return "ASCII" if (!open ($fh,"<",$f));
my $b = "";
my $n = read ($fh,$b,2);
close ($fh);
return "UTF-16" if ($b eq "\x{ff}\x{fe}");
return "ASCII";
}

它被称为：

get_encoding ($l->{file})

其中$ l-＆gt; {file}是日志文件的路径。

任何人都可以解释发生了什么，特别是在这一行：

return "UTF-16" if ($b eq "\x{ff}\x{fe}");

如果有人知道在PowerShell中执行此操作的好方法，那么任何提示都会有所不同。

吉斯利

Answer 1

程序读取并检查给定文件的前2个字节，以决定是否应返回字符串“ASCII”或“UTF-16”。

以下是一些更详细的说明：

如果无法打开文件，无论出于何种原因，它都会返回“ASCII”。（很奇怪，但这就是它的作用。）

return "ASCII" if (!open ($fh,"<",$f));

如果文件作为文件句柄$fh打开，read($fh, $b, 2)将前2（8位）字节打开到变量$b。 read的返回值，即实际读取的字节数，将存储到变量$n，尽管后者从未使用过。

my $b = "";
my $n = read ($fh,$b,2);

文件句柄$fh在阅读后立即被close。

close ($fh);

如果$b的值恰好是“\ x {ff} \ x {fe}”，则返回“UTF-16”。虽然返回“UTF-16BE”会更准确。 \x{..}是十六进制值表示的字节。因此"\x{ff}\x{fe}"中有两个字节，而不是10或12。

return "UTF-16" if ($b eq "\x{ff}\x{fe}");

最后，如果$b不等于“\ x {ff} \ x {fe}”，则返回“ASCII”。

return "ASCII";

Answer 2

来自http://franckrichard.blogspot.com/2010/08/powershell-get-encoding-file-type.html

    function Get-FileEncoding{
    [CmdletBinding()] Param (
[Parameter(Mandatory = $True, ValueFromPipelineByPropertyName = $True)] [string]$Path) 
    [byte[]]$byte = get-content -Encoding byte -ReadCount 4 -TotalCount 4 -Path $Path
    if ( $byte[0] -eq 0xef -and $byte[1] -eq 0xbb -and $byte[2] -eq 0xbf )
    { Write-Output 'UTF8' }
    elseif 
    ($byte[0] -eq 0xfe -and $byte[1] -eq 0xff)
    { Write-Output 'Unicode' }
    elseif ($byte[0] -eq 0 -and $byte[1] -eq 0 -and $byte[2] -eq 0xfe -and $byte[3] -eq 0xff)
    { Write-Output 'UTF32' }
    elseif ($byte[0] -eq 0x2b -and $byte[1] -eq 0x2f -and $byte[2] -eq 0x76)
    { Write-Output 'UTF7'}
    else
    { Write-Output 'ASCII' }}

Answer 3

脚本先前从$ f：my $n = read ($fh,$b,2);

读取两个字节到$ b

有问题的行测试这两个字节是否真的是FF和FE

我猜FF，FE是UTF-16小端编码的字节顺序标记见http://unicode.org/faq/utf_bom.html

获取从文件读取的字符串的编码

3 个答案: