逐行解析文本文件,然后将匹配的字符串放入数组中,然后输出到csv文件

时间:2019-01-15 22:49:32

标签: perl

输入如下文本文件。我叫$rlseHistRepo

Route:  TUCSON-AZ
Author: upham
Date:   2018-06-07 20:09:17 UTC
Release:0.0
Content:
        Full Release
Comment:
        Initial setup

*** Modified on Mon Jun 11 19:18:40 PDT 2018 by upham ***
QRC Acceptor: Admin
Log: http://universityofarizona/ECE101/rev0.0_060718_130854-4307-1528769914.qclog
Successful
Status: {Objects succeeded (1)} {}
--------------------------------------------------
Route:  YUMA-AZ
Author: upham
Date:   2018-06-07 20:09:18 UTC
Release:0.0
Content:
        Full Release
Comment:
        Initial setup

*** Modified on Tue Sep 25 15:40:02 PDT 2018 by upham ***
QRC Acceptor: Admin
Log: http://universityofarizona/ECE101/rev0.0_060718_130854-4307-1537915198.qclog
Successful 
Status: {Objects succeeded (33)} {}
--------------------------------------------------

我想编写一个perl脚本来解析上面的输入文件并输出到一个csv文件,但是我遇到了哈希和数组的问题,而我缺乏处理数组中数据的知识。 这里的关键是要 这是查找行的开头 路线: 作者: 日期: 发布: 日志: 状态: 内容: 注释: 信息,然后获取字符串,然后写出到csv文件

这是我的开始脚本,我正在努力获取正确的csv打印输出数组。感谢您的帮助以更正它,并指出未按顺序正确打印输出的位置和原因。 预先非常感谢

#!/usr/bin/perl

$rlseHistRepo   = $ARGV[0];

my %menu;
open(IN, "< $rlseHistRepo" ) || die "cannot read input file: $!\n";
open(OUTCSV , "> rlseLoggingRepo.csv" ) || die "cannot write output file: $!\n";
print OUTCSV "Site,Author,Release,Date,Version,Changes,Comment\n";
print OUTCSV ",,,,,,,\n";

while(<IN> ) {
    my $line = $_;
    chomp($line);
    if( $line =~ m/^Route:/) {
    my ($item, $rlsSite) = split(/\s+/, $line);
    $menu{$item} = $rlsSite;
    }
    if( $line =~ m/^Author:/) {
    my ($item, $rlsAuthor) = split(/\s+/, $line);
    $menu{$item} = $rlsAuthor;
    }

} 
close(IN);

foreach $item ( keys %menu ) {
    print OUTCSV "$menu{$item},,,,,\n";
    print "$rlsSite{$item},$rlsAuthor{$item},,,,\n";
} 

close(OUTCSV);

2 个答案:

答案 0 :(得分:0)

由于您尚未指定输出的实际外观,因此我暗中摸了一下,通过查看输入数据和正则表达式来猜测。

有关生产质量代码,请遵循@Grinnz的建议,改用Text::CSV

#!/usr/bin/perl
use strict;
use warnings;

print "Entry,Site,Author,Release,Date,Version,Changes,Comment\n";

my @entries;
while(<DATA> ) {
    chomp;
    if (my($site) = /^Route:\s+(.+)$/) {
        # start of new entry
        push(@entries, {
            site => $site,
        });
    } elsif (my($author) = /^Author:\s+(.+)$/) {
        $entries[-1]->{author} = $author;
    }
}

foreach my $index (0..$#entries) {
    my $entry = $entries[$index];
    print "$index,$entry->{site},$entry->{author},,,,,\n";
}

__DATA__
Route:  TUCSON-AZ
Author: upham
Date:   2018-06-07 20:09:17 UTC
Release:0.0
Content:
        Full Release
Comment:
        Initial setup

*** Modified on Mon Jun 11 19:18:40 PDT 2018 by upham ***
QRC Acceptor: Admin
Log: http://universityofarizona/ECE101/rev0.0_060718_130854-4307-1528769914.qclog
Successful
Status: {Objects succeeded (1)} {}
--------------------------------------------------
Route:  YUMA-AZ
Author: upham
Date:   2018-06-07 20:09:18 UTC
Release:0.0
Content:
        Full Release
Comment:
        Initial setup

*** Modified on Tue Sep 25 15:40:02 PDT 2018 by upham ***
QRC Acceptor: Admin
Log: http://universityofarizona/ECE101/rev0.0_060718_130854-4307-1537915198.qclog
Successful 
Status: {Objects succeeded (33)} {}
--------------------------------------------------

示例运行:

$ perl dummy.pl
Entry,Site,Author,Release,Date,Version,Changes,Comment
0,TUCSON-AZ,upham,,,,,
1,YUMA-AZ,upham,,,,,

编辑:一种替代方法是使用

if (/^Route:/../^----------/) {
    # we are inside a log entry...
}

然后检测

  • 带有my($keyword, $data) = /^(\w+):\s*(.*)$/;的关键字行
  • 带有my($line) = /^\s+(.+)$/;的文本行

在该区块内。

答案 1 :(得分:0)

步骤1:添加use strictuse warnings。这会引发有关未声明变量的错误。

步骤2:添加my来声明$rlseHistRepo。还添加my (%rlsSite, %rlsAuthor)来声明最终循环中使用的两个哈希。但这很奇怪,因为您正在从这些哈希中读取值,而没有在其中每个存储数据。这给了我们一些“未初始化的值”错误。所以我认为我们需要重新考虑一下。

这个想法是为每个记录建立一个哈希。当记录结束时(当我们得到破折号时),我们将输出该记录。像这样:

my @keys = qw[Route Author Date Release Log
              Status Content Comment];

my %record;

while(<IN> ) {
  chomp;
  if (/-----/) {
    say OUTCSV join ',', @record{@keys};
    %record = ();
  }

  # ignore lines without a ':'
  next unless /:/;
  # ignore the '***' lines
  next if /\*\*\*/;

  my ($key, $value) = split /\s*:\s*/, $_, 2);
  # Some keys have their values on the next line
  if ($value !~ /\S/) {  
    chomp($value = <IN>);
    $value =~ s/^\s+//;
  } 
  $record{$key} = $value;
}

第3步:通过删除一些不必要的变量并将其放入Unix过滤器(从STDIN读取并写入STDOUT)中进行一些整理-这实际上更容易编写并且可以您的程序更加灵活。

#!/usr/bin/perl

use strict;
use warnings;
use feature 'say';

my @keys = qw[Route Author Date Release Log
              Status Content Comment];

say "Site,Author,Release,Date,Version,Changes,Comment";
say ",,,,,,,";

my %record;

while (<>) {
  chomp;

  if (/-----/) {
    say join ',', @record{@keys};
    %record = ();
  }

  # ignore lines without a ':'
  next unless /:/;
  # ignore the '***' lines
  next if /\*\*\*/;

  if (my ($key, $value) = split /\s*:\s*/, $_, 2) {
    # Some keys have their values on the next line
    if ($value !~ /\S/) {
      chomp($value = <>);
      $value =~ s/^\s+//;
    }
    $record{$key} = $value;
  }
}

正如其他人所提到的,在生产代码中,您想使用Text::CSV来产生输出。