我有一个真正的管道分隔文件,我需要加载到数据库。该文件有35个字段,因此有34个管道。其中一个字段由HTML代码组成,对于某些记录,它包含多个换行符。不幸的是,没有关于线路断裂位置的模式。
我提出的解决方案是计算每行中的管道数量,直到该数字达到34,从该行中删除新的行字符。我并不是非常精通Perl,但我认为我已接近实现我想要做的事情。有什么建议吗?
#!/usr/local/bin/perl
use strict;
open (FILE, 'test.txt');
while (<FILE>) {
chomp;
my $line = $_;
#remove null characters that are included in file
$line =~ tr/\x00//;
#count number of pipes
my $count = ($line =~ tr/|//);
#each line should have 34 pipes
while ($count < 34) {
#remove new lines until line has 34 pipes
$line =~ tr/\r\n//;
$count = ($line =~ tr/|//);
print "$line\n";
}
}
答案 0 :(得分:1)
while (!eof(FILE)) {
# assemble a row of data: 35 pipe separated fields, possibly over many lines
my @fields = ();
{
# read 34 fields from FILE:
local $/ = '|';
for (1..34) {
push @fields, scalar <FILE>;
}
} # $/ is set back to original value ("\n") at the end of this block
push @fields, scalar <FILE>; # read last field, which ends with newline
my $line = join '|', @fields;
... now you can process $line, and you already have the @fields ......
}
答案 1 :(得分:1)
我想这应该可行。
#!/usr/bin/perl
use strict;
open (FILE, 'test.txt');
my $num_pipes = 0, my $line_num = 0;
my $tmp = "";
while (<FILE>)
{
$line_num++;
chomp;
my $line = $_;
$line =~ tr/\x00//; #remove null characters that are included in file
$num_pipes += ($line =~ tr/|//); #count number of pipes
if ($num_pipes == 34 && length($tmp))
{
$tmp .= $line;
print "$tmp\n";
# Reset values.
$tmp = "";
$num_pipes = 0;
}
elsif ($num_pipes == 34 && length($tmp) == 0)
{
print "$line\n";
$num_pipes = 0;
}
elsif ($num_pipes < 34)
{
$tmp .= $line;
}
elsif ($num_pipes > 34)
{
print STDERR "Error before line $line_num. Too many pipes ($num_pipes)\n";
$num_pipes = 0;
$tmp = "";
}
}