跳过数组中的一行,Perl

时间:2012-06-17 17:57:24

标签: arrays perl file csv compare

我对Perl比较陌生,我遇到过这个项目,我有点困难。 该项目的目的是比较两个csv文件,其中一个将包含: $ name,$ model,$ version 和另一个包含: $名2,$磁盘,存储$ 最后,RESULT文件将包含匹配的行,并将信息放在一起,如下所示: $ name,$ model,$ version,$ disk,$ storage。

我设法做到了这一点,但我的问题是当错过程序的其中一个元素中断时。当遇到文件中缺少元素的行时,它会在该行停止。我该如何解决这个问题?任何建议或方式,我怎么可能让它跳过那条线并继续?

这是我的代码:

open( TESTING, '>testing.csv' ); # Names will be printed to this during testing. only .net       ending names should appear
open( MISSING, '>Missing.csv' ); # Lines with missing name feilds will appear here.

#open (FILE,'C:\Users\hp-laptop\Desktop\file.txt');
#my (@array) =<FILE>;
my @hostname;    #stores names

#close FILE;
#***** TESTING TO SEE IF ANY OF THE LISTED ITEMS BEGIN WITH A COMMA AND DO NOT HAVE A   NAME.
#***** THESE OBJECTS ARE PLACED INTO THE MISSING ARRAY AND THEN PRINTED OUT IN A SEPERATE
#***** FILE.
#open (FILE,'C:\Users\hp-laptop\Desktop\file.txt');
#test
if ( open( FILE, "file.txt" ) ) {

}
else {
  die " Cannot open file 1!\n:$!";

}

$count = 0;
$x     = 0;
while (<FILE>) {

  ( $name, $model, $version ) = split(",");    #parsing

  #print $name;
  chomp( $name, $model, $version );

  if ( ( $name =~ /^\s*$/ )
      && ( $model   =~ /^\s*$/ )
      && ( $version =~ /^\s*$/ ) )    #if all of the fields  are blank ( just a blank space)
  {

    #do nothing at all
  }
  elsif ( $name =~ /^\s*$/ ) {   #if name is a blank
    $name =~ s/^\s*/missing/g;
    print MISSING "$name,$model,$version\n";

    #$hostname[$count]=$name;
    #$count++;
  }
  elsif ( $model =~ /^\s*$/ ) {   #if model is blank
    $model =~ s/^\s*/missing/g;
    print MISSING"$name,$model,$version\n";
  }
  elsif ( $version =~ /^\s*$/ ) {   #if version is blank
    $version =~ s/^\s*/missing/g;
    print MISSING "$name,$model,$version\n";
  }

  # Searches for .net to appear in field "$name" if match, it places it into hostname array.
  if ( $name =~ /.net/ ) {

    $hostname[$count] = $name;
    $count++;
  }

#searches for a comma in the name feild, puts that into an array and prints the line into the missing file.
#probably won't have to use this, as I've found a better method to test all of the    feilds ( $name,$model,$version)
#and put those into the missing file. Hopefully it works.
#foreach $line (@array)
#{
#if($line =~ /^\,+/)
#{
#$line =~s/^\,*/missing,/g;
#$missing[$x]=$line;
#$x++;
#}
#}

}
close FILE;

for my $hostname (@hostname) {
  print TESTING $hostname . "\n";
}

#for my $missing(@missing)
#{
# print MISSING $missing;
#}
if ( open( FILE2, "file2.txt" ) ) {    #Run this if the open succeeds

  #open outfile and print starting header
  open( RESULT, '>resultfile.csv' );
  print RESULT ("name,Model,version,Disk, storage\n");
}
else {
  die " Cannot open file 2!\n:$!";
}
$count = 0;
while ( $hostname[$count] ne "" ) {
  while (<FILE>) {
    ( $name, $model, $version ) = split(",");    #parsing

    #print $name,"\n";

    if ( $name eq $hostname[$count] )    # I think this is the problem area.
    {
      print $name, "\n", $hostname[$count], "\n";

      #print RESULT"$name,$model,$version,";
      #open (FILE2,'C:\Users\hp-laptop\Desktop\file2.txt');
      #test
      if ( open( FILE2, "file2.txt" ) ) {

      }
      else {
        die " Cannot open file 2!\n:$!";

      }

      while (<FILE2>) {
        chomp;
        ( $name2, $Dcount, $vname ) = split(",");    #parsing

        if ( $name eq $name2 ) {
          chomp($version);
          print RESULT"$name,$model,$version,$Dcount,$vname\n";

        }

      }

    }

    $count++;
  }

  #open (FILE,'C:\Users\hp-laptop\Desktop\file.txt');
  #test
  if ( open( FILE, "file.txt" ) ) {

  }
  else {
    die " Cannot open file 1!\n:$!";

  }

}

close FILE;
close RESULT;
close FILE2;

2 个答案:

答案 0 :(得分:2)

我认为你想要next,它可以让你立即完成当前的迭代并开始下一个迭代:

while (<FILE>) {
  ( $name, $model, $version ) = split(",");
  next unless( $name && $model && $version );
  ...;
  }

您使用的条件取决于您接受的值。在我的例子中,我假设所有值都需要为真。如果他们不需要是空字符串,也许你可以检查长度:

while (<FILE>) {
  ( $name, $model, $version ) = split(",");
  next unless( length($name) && length($model) && length($version) );
  ...;
  }

如果您知道如何验证每个字段,则可能有以下子例程:

while (<FILE>) {
  ( $name, $model, $version ) = split(",");
  next unless( length($name) && is_valid_model($model) && length($version) );
  ...;
  }

sub is_valid_model { ... }

现在你只需要决定如何将它整合到你正在做的事情中。

答案 1 :(得分:2)

首先应将use strictuse warnings添加到程序的顶部,并在首次使用时使用my声明所有变量。这将揭示许多其他难以发现的简单错误。

您还应该使用open和词法文件句柄的三参数,并且用于检查打开文件的例外的Perl惯用法是将or die添加到open调用。带有空块的if语句,用于成功路径浪费空间并变得不可读。 open来电应该是这样的

open my $fh, '>', 'myfile' or die "Unable to open file: $!";

最后,在处理CSV文件时使用Perl模块要安全得多,因为使用简单的split /,/会有很多陷阱。 Text::CSV模块已经为您完成了所有工作,并且可以在CPAN上使用。

问题在于,在读取到第一个文件的末尾之后,在第二个嵌套循环中再次从同一个句柄读取之前,不会回滚或重新打开它。这意味着将不再从该文件中读取数据,程序将表现为空。

通过相同的文件读取数百次只是为了配对相应的记录是一个糟糕的策略。如果文件大小合理,则应在内存中构建数据结构以保存信息。 Perl哈希是理想的,因为它允许您立即查找对应于给定名称的数据。

我已经编写了代码的修订版,用于演示这些要点。由于我没有样本数据,因此测试代码对我来说很尴尬,但如果您仍然遇到问题,请告诉我们。

use strict;
use warnings;

use Text::CSV;

my $csv = Text::CSV->new;

my %data;

# Read the name, model and version from the first file. Write any records
# that don't have the full three fields to the "MISSING" file
#
open my $f1, '<', 'file.txt' or die qq(Cannot open file 1: $!);

open my $missing, '>', 'Missing.csv' 
    or die qq(Unable to open "MISSING" file for output: $!);
    # Lines with missing name fields will appear here.

while ( my $line = csv->getline($f1) ) {

  my $name = $line->[0];

  if (grep $_, @$line < 3) {
    $csv->print($missing, $line);
  }
  else {
    $data{$name} = $line if $name =~ /\.net$/i;
  }
}

close $missing;

# Put a list of .net names found into the testing file
#
open my $testing, '>', 'testing.csv'
    or die qq(Unable to open "TESTING" file for output: $!);
    # Names will be printed to this during testing. Only ".net" ending names should appear

print $testing "$_\n" for sort keys %data;

close $testing;

# Read the name, disk and storage from the second file and check that the line
# contains all three fields. Remove the name field from the start and append
# to the data record with the matching name if it exists.
#
open my $f2, '<', 'file2.txt' or die qq(Cannot open file 2: $!);

while ( my $line = $csv->getline($f2) ) {

  next unless grep $_, @$line >= 3;

  my $name = shift @$line;
  next unless $name =~ /\.net$/i;

  my $record = $data{$name};
  push @$record, @$line if $record;
}

# Print the completed hash. Send each record to the result output if it
# has the required five fields
#
open my $result, '>', 'resultfile.csv' or die qq(Cannot open results file: $!);

$csv->print($result, qw( name Model version Disk storage ));

for my $name (sort keys %data) {

  my $line = $data{$name};

  if (grep $_, @$line >= 5) {
    $csv->print($result, $data{$name});
  }
}