好的 - 我会发布我的整个剧本,因为当我不这样做时,我会受到惩罚 - 即使我上次这样做,我也因为发布整个剧本而受到严厉批评。我只需要知道我最初询问的那一行是否有效。完整的脚本(一直工作得很好,直到另一个部门给我的数据完全不同于我们最初告诉他们的数据)才能完成
我正在解析并清理CSV文件,以便可以将其加载到MySQL表中。它是通过其他人的批量Java程序和#34;如果任何字段为空,则批处理文件将停止并显示错误。
我被告知只要在任何记录中都有一个空字段,就放入一个空白区域。这项工作会简单吗?
if ( ! length $fields[2] ) {
$_ = ' ' for $fields[2];
}
有没有办法一次检查各种多个字段?或者更好的方法是检查所有字段(这是在分割记录之后),这是我在将记录写回CSV文件之前做的最后一件事。
这是整个脚本。请不要告诉我,我在已经运行的脚本中所做的事情并不是你怎么做的。 -
#!/usr/bin/perl/
use strict;
use warnings;
use Data::Dumper;
use Time::Piece;
my $filename = 'mistints_1505_comma.csv';
#my $filename = 'test.csv';
# Open input file
open my $FH, $filename
or die "Could not read from $filename <$!>, program halting.";
# Open error handling file
open ( my $ERR_FH, '>', "errorFiles1505.csv" ) or die $!;
# Read the header line of the input file and print to screen.
chomp(my $line = <$FH>);
my @fields = split(/,/, $line);
print Dumper(@fields), $/;
my @data;
# Read the lines one by one.
while($line = <$FH>) {
chomp($line);
# Scrub data of characters that cause scripting problems down the line.
$line =~ s/[\'\\]/ /g;
# split the fields of each record
my @fields = split(/,/, $line);
# Check if the storeNbr field is empty. If so, write record to error file.
if (!length $fields[28]) {
chomp (@fields);
my $str = join ',', @fields;
print $ERR_FH "$str\n";
}
else
{
# Concatenate the first three fields and add to the beginning of each record
unshift @fields, join '_', @fields[28..30];
# Format the DATE fields for MySQL
$_ = join '-', (split /\//)[2,0,1] for @fields[10,14,24,26];
# Scrub colons from the data
$line =~ s/:/ /g;
# If Spectro_Model is "UNKNOWN", change
if($fields[22] eq "UNKNOWN"){
$_ = 'UNKNOW' for $fields[22];
}
# If tran_date is blank, insert 0000-00-00
if(!length $fields[10]){
$_ = '0000-00-00' for $fields[10];
}
# If init_tran_date is blank, insert 0000-00-00
if(!length $fields[14]){
$_ = '0000-00-00' for $fields[14];
}
# If update_tran_date is blank, insert 0000-00-00
if(!length $fields[24]){
$_ = '0000-00-00' for $fields[24];
}
# If cancel_date is blank, insert 0000-00-00
if(!length $fields[26]){
$_ = '0000-00-00' for $fields[26];
}
# Format the PROD_NBR field by deleting any leading zeros before decimals.
$fields[12] =~ s/^\s*0\././;
# put the records back
push @data, \@fields;
}
}
close $FH;
close $ERR_FH;
print "Unsorted:\n", Dumper(@data); #, $/;
#Sort the clean files on Primary Key, initTranDate, updateTranDate, and updateTranTime
@data = sort {
$a->[0] cmp $b->[0] ||
$a->[14] cmp $b->[14] ||
$a->[26] cmp $b->[26] ||
$a->[27] cmp $b-> [27]
} @data;
#open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/parsedMistints.csv';
open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/cleaned1505.csv';
print $OFH join(',', @$_), $/ for @data;
close $OFH;
exit;
答案 0 :(得分:1)
据我所知,你已经在逗号,
上拆分了一条记录,并且你想要改变所有空字符串的字段以包含一个空格
我会写这个
use strict;
use warnings 'all';
my $record = 'a,b,c,,e,,g,,i,,k,,m,n,o,p,q,r,s,t';
my @fields = map { $_ eq "" ? ' ' : $_ } split /,/, $record;
use Data::Dump;
dd \@fields;
[ "a", "b", "c", " ", "e", " ", "g", " ", "i", " ", "k", " ", "m" .. "t" ]
或者,如果某些字段需要设置为不同的字段(如果它们为空),则可以设置默认数组
看起来像这样。除了字段10,11和12(@defaults
)之外,所有0000-00-00
数组都设置为空格。这些是在分割记录后获取的
use strict;
use warnings 'all';
my @defaults = (' ') x 20;
$defaults[$_] = '0000-00-00' for 9, 10, 11;
my $record = 'a,b,c,,e,,g,,i,,k,,m,n,o,p,q,r,s,t';
my @fields = split /,/, $record;
for my $i ( 0 .. $#fields ) {
$fields[$i] = $defaults[$i] if $fields[$i] eq '';
}
use Data::Dump;
dd \@fields;
[ "a", "b", "c", " ", "e", " ", "g", " ", "i", "0000-00-00", "k", "0000-00-00", "m" .. "t" ]
看过你的完整节目后,我推荐这样的话。如果您已经显示了输入数据的样本,那么我可以使用哈希来引用列名而不是数字,从而使其更具可读性
#!/usr/bin/perl/
use strict;
use warnings 'all';
use Data::Dumper;
use Time::Piece;
my $filename = 'mistints_1505_comma.csv';
#my $filename = 'test.csv';
open my $FH, $filename
or die "Could not read from $filename <$!>, program halting.";
open( my $ERR_FH, '>', "errorFiles1505.csv" ) or die $!;
chomp( my $line = <$FH> );
my @fields = split /,/, $line; #/
print Dumper( \@fields ), "\n";
my @data;
# Read the lines one by one.
while ( <$FH> ) {
chomp;
# Scrub data of characters that cause scripting problems down the line.
tr/'\\/ /; #'
my @fields = split /,/; #/
# Check if the storeNbr field is empty. If so, write record to error file.
if ( $fields[28] eq "" ) {
my $str = join ',', @fields;
print $ERR_FH "$str\n";
next;
}
# Concatenate the first three fields and add to the beginning of each record
unshift @fields, join '_', @fields[ 28 .. 30 ];
# Format the DATE fields for MySQL
$_ = join '-', ( split /\// )[ 2, 0, 1 ] for @fields[ 10, 14, 24, 26 ];
# Scrub colons from the data
tr/://d; #/
my $i = 0;
for ( @fields ) {
# If "Spectro_Model" is "UNKNOWN" then change to "UNKNOW"
if ( $i == 22 ) {
$_ = 'UNKNOW' if $_ eq 'UNKNOWN';
}
# If a date field is blank then insert 0000-00-00
elsif ( grep { $i == $_ } 10, 14, 24, 26 ) {
$_ = '0000-00-00' if $_ eq "";
}
# Format the PROD_NBR field by deleting any leading zeros before decimals.
elsif ( $i == 12 ) {
s/^\s*0\././;
}
# Change all remaining empty fields to a single space
else {
$_ = ' ' if $_ eq "";
}
++$i;
}
push @data, \@fields;
}
close $FH;
close $ERR_FH;
print "Unsorted:\n", Dumper(@data); #, $/;
#Sort the clean files on Primary Key, initTranDate, updateTranDate, and updateTranTime
@data = sort {
$a->[0] cmp $b->[0] or
$a->[14] cmp $b->[14] or
$a->[26] cmp $b->[26] or
$a->[27] cmp $b->[27]
} @data;
#open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/parsedMistints.csv';
open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/cleaned1505.csv' or die $!;
print $OFH join(',', @$_), $/ for @data;
close $OFH;
答案 1 :(得分:0)
好吧,如果你在分成$fields
之前就这样做了,你应该可以做类似的事情
# assuming a CSV line is in $_
#pad null at start of line
s/^,/ ,/;
#pad nulls in the middle
s/,,/, ,/g;
#pad null at the end
s/,$/, /;
答案 2 :(得分:0)
请勿尝试推出自己的CSV解析代码。使用Text::CSV或Text::CSV::Slurp。
使用Text :: CSV,您可以执行类似
的操作$line = $csv->string(); # get the combined string
$status = $csv->parse($line); # parse a CSV string into fields
@columns = map {defined $_ ? $_ : " "} $csv->fields(); # get the parsed fields
你真的确定要用空格替换空值吗?我说如果字段未定义,则db中应为NULL。