Question

场景我是Jr. C＃开发人员，但最近（3天）开始学习Perl批处理文件。我需要解析文本文件，提取一些关键数据，然后将关键数据输出到新的文本文件。似乎总是如此，网上有一些关于如何从文件“读取”，“写入”到文件，“逐行”存储到数组中，“过滤”这个和那个，yadda yadda，但没有讨论读，过滤，写的整个过程。试图将网络中的示例拼接在一起并不好，因为似乎没有一个可以作为连贯的代码一起工作。来自C＃，Perl的语法结构令人困惑。我只需要对这个过程提出一些建议。

我的目标是解析文本文件，按日期单独输出与下面类似的所有行，并仅输出第二个数字组的前8位数字和第3个数字组中的5个数字到新的文本文件。

11122 20100223454345 ....random text..... [keyword that identifies all the 
entries I need]... random text 0.0034543345

我知道正则表达式可能是最好的选择，并且已经编写了大部分表达式，但它在Perl中不起作用！

问题：有人可以展示一个简单的（虚拟）示例，说明如何读取，过滤（使用虚拟正则表达式）文件，然后将（虚拟）结果输出到新文件？我不关心功能细节，我可以学习那些，我只需要Perl使用的语法结构。例如：

 open(FH, '<', 'dummy1.txt')
 open(NFH, '>', 'dummy2.txt')

 @array; or $dumb;
 while(<FH>) 
 {
    filter each line [REGEX] and shove it into [@array or $dumb scalar]
 } 
 print(join(',', @array)) to dummy2.txt
 close FH;
 close NFH;

注意：由于各种原因，我无法在此处粘贴我的源代码，抱歉。任何帮助表示赞赏。

更新：答案：

非常感谢所有提供我的问题洞察力的人。在阅读了你的回复，以及进行进一步的研究之后，我了解到在Perl中有很多方法可以完成相同的任务（我不喜欢它）。最后，这就是我解决问题的方法，对于那些有类似困难的人来说，IMO是最干净，最简洁的解决方案。再次感谢所有的帮助。

      #======================================================================
  # 1. READ FILE:   inputFile.txt
  # 2. CREATE FILE: outputFile.txt
  # 3. WRITE TO:    outputFile.txt IF line matches REGEX constraints
  # 4. CLOSE FILES: outputFile.txt & inputFile.txt
  #==========================================================================

  #1
  $readFile = 'C:/.../.../inputFile.txt';
  open(FH, '<', $readFile) or Error("Could not read file ($!)");

  #2
  $writeFile = 'C:/.../.../outputFile.txt';
  open(NFH, '>', $writeFile) or Error("Cannot write to file ($!)");

  #3
  @lines = <FH>;
  LINE: foreach $line (@lines)
  {
     if ($line =~ m/(201403\d\d).*KEYWORD.*time was (\d+\.\d+)/)
     {
        $date = $1;
        $elapsedtime = $2;
        print NFH "$date,$elapsedtime\n";
     }
  }

  #4
  close NFH;
  close FH;

Answer 1

while(<FH>)
{
  # variable $_ contains the current line

  if(m/regex_goes_here/) #by default, the regex match operator m// attempts to match the default $_ variable  
  {  
    #do actions  
  }  
}

另请注意，m/regex/与/regex/

相同

参考：

要从正则表达式匹配中捕获变量，THIS可能有帮助

修改

如果您想要一个与默认$_不同的变量，正如@Miller建议的那样，请使用while($line = <FH>)后跟if($line =~ m/regex_goes_here/)

=~是Binding Operator

Answer 2

perlfaq5 - How do I change, delete, or insert a line in a file, or append to the beginning of a file?涵盖了大多数有关如何使用文件的不同方案。

但是，我将通过说始终使用use strict;和use warnings;启动您的脚本来添加，并且因为您正在进行文件处理，use autodie;也会为您提供服务。

考虑到这一点，快速存根将是以下内容：

use strict;
use warnings;
use autodie;

open my $infh, '<', 'dummy1.txt';
open my $outfh, '>', 'dummy2.txt';

while (my $line = <$infh>) {
    chomp $line; # Remove \n

    if (Whatever magically processing here) {
        print $outfh, "your new data";
    }
}

Answer 3

use strict;
use warnings;
use autodie;
use feature qw(say);

use constant {
    INPUT_FILE  => "NAME_OF_INPUT_FILE",
    OUTPUT_FILE => "NAME_OF_OUTPUT_FILE",
    FILTER      => qr/regex_for_line_to_filter/,
};

open my $in_fh, "<", INPUT_FILE;
open my $out_fh, ">", OUTPUT_FILE;

while ( my $line = <$in_fh> ) {
    chomp $line;
    next unless $line =~ FILTER;
    $line =~ s/regular_expression/replacement/;
    say {$out_fh} $line;
}
close $in_file;
close $out_file;

$in_file是您的输入文件，$out_fh是您的输出文件。我基本上打开两个，并循环输入。 chomp从最后删除\n。我总是建议这样做。

next进入循环的下一次迭代，除非我匹配FILTER，这是一个匹配您想要保留的行的正则表达式。这与：

相同

if ( $line !~ FILTER ) {
    next;
}

然后我使用 substitution 命令获取我想要的行的部分，并将它们移动到我想要的输出中。我可能最好稍微扩展一下。也许使用split将我的线分成不同的部分，唯一使用我想要的部分。然后我可以使用substr从选择部分中提取子字符串。

say命令与print类似，只不过它会自动添加到NL中。这就是你在文件中写一行的方法。

现在，获取Learning Perl并阅读它。如果你知道任何编程。你不应该花一个多星期的时间来完成本书的前半部分。这应该足以能够编写这样的程序。参考和面向对象等更复杂的东西可能需要更长的时间。

可以在http://perldoc.perl.org找到在线文档。您可以在那里查找名为 pragma 的use语句。还可以获得有关个人functions的文档。

Answer 4

一个提示。不要为输入和输出文件显式打开文件句柄。而是从STDIN读取并写入STDOUT。您的程序将更灵活，更易于使用，因为您可以像Unix过滤器一样对待它。

$ your_filter_program < your_input.txt > your_output.txt

这样做实际上使你的程序也更容易编写。

while (<>) { # <> reads from STDIN
  # transform your data (which is in $_) in some way
  ...
  print; # prints $_ to STDOUT
}

您可能会发现Data Munging with Perl的前几章很有用。

Answer 5

如果我理解得很好，这个班轮将完成这项任务：

perl -ane 'print substr($F[1],0,8),"\t",substr($F[-1],0,5),"\n" if /keyword/' in.txt

假设in.txt是：

11122 20100223454345 ....random text..... [keyword that identifies all the entries I need]... random text 0.0034543345
11122 30100223454345 ....random text..... [ that identifies all the entries I need]... random text 0.124543345
11122 40100223454345 ....random text..... [keyword that identifies all the entries I need]... random text 0.65487
11122 50100223454345 ....random text..... [ that identifies all the entries I need]... random text 0.6215

<强>输出：

20100223    0.003
40100223    0.654

Perl - 如何阅读，过滤＆amp;输出结果

5 个答案:

修改