我编写了一个Perl脚本来解析文件,将其擦除并将其放入新文件中。使用我最初使用的测试数据,但是现在我已经获得了所有实际数据,结果发现在新擦除的文件中我不想要大量的记录(主要是因为太多这些记录中的字段是空的。
所以我现在需要检查记录中的特定字段是否为空,如果是,则将其写入"错误"文件,而不是写出到清理数据文件。下面是我的脚本(在人们提起之前,我没有Text :: CSV模块,我也没有它可用)
注意 - 在我尝试将IF / ELSE语句放入其中之前,代码正在处理我在获得这些问题记录的实际数据之前的数据。
#!/usr/bin/perl/
use strict;
use warnings;
use Data::Dumper;
use Time::Piece;
my $filename = 'uncleanData.csv';
open my $FH, $filename
or die "Could not read from $filename <$!>, program halting.";
# Read the header line.
chomp(my $line = <$FH>);
my @fields = split(/,/, $line);
print Dumper(@fields), $/;
my @data;
# Read the lines one by one.
while($line = <$FH>) {
chomp($line);
以下是我使用ELSE下面的代码添加的新IF语句,但我之前的工作脚本没有改变 -
# Check if the storeNbr field is empty. If so, write record to error file.
if (!length $fields[28]) {
open ( my $ERR_FH, '>', "errorFiles.csv" ) or die $!;
print $ERR_FH join(',', @$_), $/ for @data;
close $ERR_FH;
}
else
{
# Scrub data of characters that cause scripting problems down the line.
$line =~ s/[\'\\]/ /g;
# split the fields, concatenate fields 28-30, and add the
# concatenated field to the beginning of each line in the file
my @fields = split(/,/, $line);
unshift @fields, join '_', @fields[28..30];
# Format the DATE fields for MySQL
$_ = join '-', (split /\//)[2,0,1] for @fields[10,14,24,26];
# Scrub colons from the data
$line =~ s/:/ /g;
# If Spectro_Model is "UNKNOWN", change
if($fields[22] eq "UNKNOWN"){
$_ = 'UNKNOW' for $fields[22];
}
# If tran_date is blank, insert 0000-00-00
if(!length $fields[10]){
$_ = '0000-00-00' for $fields[10];
}
# If init_tran_date is blank, insert 0000-00-00
if(!length $fields[14]){
$_ = '0000-00-00' for $fields[14];
}
# If update_tran_date is blank, insert 0000-00-00
if(!length $fields[24]){
$_ = '0000-00-00' for $fields[24];
}
# If cancel_date is blank, insert 0000-00-00
if(!length $fields[26]){
$_ = '0000-00-00' for $fields[26];
}
# Format the PROD_NBR field by deleting any leading zeros before decimals.
$fields[12] =~ s/^\s*0\././;
# put the records back
push @data, \@fields;
}
}
close $FH;
print "Unsorted:\n", Dumper(@data); #, $/;
#Sort the clean files on Primary Key, initTranDate, updateTranDate, and updateTranTime
@data = sort {
$a->[0] cmp $b->[0] ||
$a->[14] cmp $b->[14] ||
$a->[26] cmp $b->[26] ||
$a->[27] cmp $b-> [27]
} @data;
#open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/parsedMistints.csv';
open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/cleaned1502.csv';
print $OFH join(',', @$_), $/ for @data;
close $OFH;
exit;
我猜测我的问题是我为声明的ELSE部分放置闭括号}的地方。以下是文件中的一些示例记录,其中最后一个文件是&#34;问题&#34;记录 -
650096571,1,1,used as store paint,14,IFC 8012NP,Standalone-9,3596,56,1/31/2015,80813,A97W01251,,1/16/2015,0.25,0.25,,SW,CUSTOM MATCH,TRUE,O,xts,,,,,,,1568,61006,1,FALSE
650368376,1,3,Tinted Wrong Color,16,IFC 8012NP,01DX8015206,,6,1/31/2015,160720,A87W01151,MATCH,1/31/2015,1,1,ENG,CUST,CUSTOM MATCH,TRUE,O,Ci52,,,,,,,1584,137252,1,FALSE
650175433,3,1,not tinted - e.w.,16,COROB MODULA HF,Standalone-7,,2,1/31/2015,95555,B20W02651,,1/29/2015,3,3,,COMP,CUSTOM MATCH,TRUE,P,xts,,,,,,,1627,68092,5,FALSE
650187016,2,1,checked out under cash ,,,,,,,,,,,,,,,,,,,,,,,,,,,,
当我运行此脚本时,它仍在处理&#34;错误记录&#34;并抛出各种&#34;单一价值&#34;警告。
答案 0 :(得分:0)
Text::CSV
非常有用。如果您需要该功能,Text::ParseWords
可以替代。
但只要你没有引用担心,split
就可以了。
您可以执行以下操作:
#!/usr/bin/env perl
use strict;
use warnings;
open ( my $normal_fh, '>', "output.txt" ) or die $!;
open ( my $err_fh, '>', "errors.txt" ) or die $!;
while ( <> ) {
if ( ( split /,/ ) [27] =~ /\w/ ) {
select $normal_fh;
}
else {
select $err_fh;
}
print;
}