我无法让perl脚本工作。该问题可能与while循环中逐行读取Extract文件有关,任何帮助都将受到赞赏。有两个文件
包含错误ID列表(100个ID)的错误文件
2
3个
包含带有字段1中的ID(数百万行)的分隔数据的提取
1 |数据|数据|数据
2 |数据|数据|数据
2 |数据|数据|数据
2 |数据|数据|数据
3 |数据|数据|数据
4 |数据|数据|数据
5 |数据|数据|数据
我正在尝试从ID匹配的大提取文件中删除所有行。 ID可以匹配多行。提取物已分类。
#use strict;
#use warnning;
$SourceFile = $ARGV[0];
$ToRemove = $ARGV[1];
$FieldNum = $ARGV[2];
$NewFile = $ARGV[3];
$LargeRecords = $ARGV[4];
open(INFILE, $SourceFile) or die "Can't open source file: $SourceFile \n";
open(REMOVE, $ToRemove) or die "Can't open toRemove file: $ToRemove \n";
open(OutGood, "> $NewFile") or die "Can't open good output file \n";
open(OutLarge, "> $LargeRecords") or die "Can't open Large Records output file \n";
#Read in the list of bad IDs into array
@array = <REMOVE>;
#Loop through each bad record
foreach (@array)
{
$badID = $_;
#read the extract line by line
while(<INFILE>)
{
#take the line and split it into
@fields = split /\|/, $_;
my $extractID = $fields[$FieldNum];
#print "Here's what we got: $badID and $extractID\n";
while($extractID == $badID)
{
#Write out bad large records
print OutLarge join '|', @fields;
#Get the next line in the extract file
@fields = split /\|/, <INFILE>;
my $extractID = $fields[$FieldNum];
$found = 1; #true
#print " We got a match!!";
#remove item after it has been found
my $input_remove = $badID;
@array = grep {!/$input_remove/} @array;
}
print OutGood join '|', @fields;
}
}
答案 0 :(得分:2)
试试这个:
$ perl -F'|' -nae 'BEGIN {while(<>){chomp; $bad{$_}++;last if eof;}} print unless $bad{$F[0]};' bad good
答案 1 :(得分:1)
首先,你很幸运:坏ID的数量很小。这意味着,您可以一次读取错误ID列表,将它们粘贴在哈希表中,而不会遇到任何内存使用困难。将它们放入哈希后,您只需逐行读取大数据文件,跳过错误ID的输出。
#!/usr/bin/env perl
use strict;
use warnings;
# hardwired for convenience
my $bad_id_file = 'bad.txt';
my $data_file = 'data.txt';
my $bad_ids = read_bad_ids($bad_id_file);
remove_data_with_bad_ids($data_file, $bad_ids);
sub remove_data_with_bad_ids {
my $file = shift;
my $bad = shift;
open my $in, '<', $file
or die "Cannot open '$file': $!";
while (my $line = <$in>) {
if (my ($id) = extract_id(\$line)) {
exists $bad->{ $id } or print $line;
}
}
close $in
or die "Cannot close '$file': $!";
return;
}
sub read_bad_ids {
my $file = shift;
open my $in, '<', $file
or die "Cannot open '$file': $!";
my %bad;
while (my $line = <$in>) {
if (my ($id) = extract_id(\$line)) {
$bad{ $id } = undef;
}
}
close $in
or die "Cannot close '$file': $!";
return \%bad;
}
sub extract_id {
my $string_ref = shift;
if (my ($id) = ($$string_ref =~ m{\A ([0-9]+) }x)) {
return $id;
}
return;
}
答案 2 :(得分:1)
我使用哈希如下:
use warnings;
use strict;
my @bad = qw(2 3);
my %bad;
$bad{$_} = 1 foreach @bad;
my @file = qw (1|data|data|data 2|data|data|data 2|data|data|data 2|data|data|data 3|data|data|data 4|data|data|data 5|data|data|data);
my %hash;
foreach (@file){
my @split = split(/\|/);
$hash{$split[0]} = $_;
}
foreach (sort keys %hash){
print "$hash{$_}\n" unless exists $bad{$_};
}
给出了:
1|data|data|data
4|data|data|data
5|data|data|data