如何在perl中验证纯文本文件

时间:2015-04-13 02:21:25

标签: perl

我尝试使用带有if-e-f等参数的 -T 语句检查文件是否存在,如果是一个文件,如果是纯文本文件。

print "$_ is readable text\n" if -e $_ && $f $_ && -T $_; 

但这不会按预期工作:某些二进制文件已经显示。

2 个答案:

答案 0 :(得分:1)

用于确定-B-T

启发式测试

命令perldoc -f -T | sed -ne '/-T.*"-B".*work/,+3p'输出:

          The "-T" and "-B" switches work as follows.  The first block or
          so of the file is examined for odd characters such as strange
          control codes or characters with the high bit set.  If too many
          strange characters (>30%) are found, it's a "-B" file;

因此,如果30%与您的需求不符,则必须采用其他方式:

使用Test::PureASCII

  use Test::PureASCII tests => $how_many;
  file_is_pure_ascii($filename1, "only ASCII in $filaname1");

或构建自己的测试例程。

#!/usr/bin/perl -w -CIO
use strict;

sub testfile {
    open FH, "<" . $_[0];
    my $block = " " x 4096;
    sysread FH, $block, 4096;
    close FH;
    return 0 unless $block =~ /^[\r\n\t -~]*$/s;
    return 1;
}
opendir DH, "/tmp";
map {
    printf "%s\n", $_;
  } grep {
     -f "/tmp/" . $_ &&
     -r "/tmp/" . $_ &&
     testfile "/tmp/" . $_;
  } readdir DH;

同样,接受UTF-8

#!/usr/bin/perl -w -CIO
use strict;
use utf8;

sub testfile {
    open FH, "<" . $_[0];
    my $block = " " x 16384;
    sysread FH, $block, 16384;
    close FH;
    utf8::decode $block if utf8::valid($block);
    return 0 unless $block =~ /^([\r\n\t -~]|\p{Latin})*$/s;
    return 1;
}
opendir DH, "/tmp";
map { printf "%s\n", $_ } grep { -f "/tmp/" . $_ && -r "/tmp/".
     $_ && testfile "/tmp/" . $_; } readdir DH;

答案 1 :(得分:0)

你可以使用

if(-f "filename" ) {
    #validate content if required
    #do other things
}

如果要验证文件的内容,则必须在执行其他操作之前自行编写该逻辑。

有关详细信息,请参阅http://perldoc.perl.org/functions/-X.html