如何删除换行符,直到每一行都有特定数量的特定字符实例?

时间:2011-01-21 19:52:50

标签: perl cygwin

我有一个真正的管道分隔文件,我需要加载到数据库。该文件有35个字段,因此有34个管道。其中一个字段由HTML代码组成,对于某些记录,它包含多个换行符。不幸的是,没有关于线路断裂位置的模式。

我提出的解决方案是计算每行中的管道数量,直到该数字达到34,从该行中删除新的行字符。我并不是非常精通Perl,但我认为我已接近实现我想要做的事情。有什么建议吗?

#!/usr/local/bin/perl

use strict;

open (FILE, 'test.txt');

while (<FILE>) {
    chomp;
    my $line = $_;
    #remove null characters that are included in file
    $line =~ tr/\x00//;
    #count number of pipes
    my $count = ($line =~ tr/|//);
    #each line should have 34 pipes
    while ($count < 34) {
        #remove new lines until line has 34 pipes
        $line =~ tr/\r\n//;
        $count = ($line =~ tr/|//);
        print "$line\n";
    }
}

2 个答案:

答案 0 :(得分:1)

$/input record separator

交错
while (!eof(FILE)) {

    # assemble a row of data: 35 pipe separated fields, possibly over many lines
    my @fields = ();
    {
        # read 34 fields from FILE:
        local $/ = '|';
        for (1..34) {
            push @fields, scalar <FILE>;
        }
    }   # $/ is set back to original value ("\n") at the end of this block

    push @fields, scalar <FILE>;  # read last field, which ends with newline
    my $line = join '|', @fields;
    ... now you can process $line, and you already have the @fields ......
}

答案 1 :(得分:1)

我想这应该可行。

#!/usr/bin/perl

use strict;

open (FILE, 'test.txt');

my $num_pipes = 0, my $line_num = 0;
my $tmp = "";
while (<FILE>)
{
    $line_num++;
    chomp;
    my $line = $_;
    $line =~ tr/\x00//; #remove null characters that are included in file
    $num_pipes += ($line =~ tr/|//); #count number of pipes
    if ($num_pipes == 34 && length($tmp))
    {
            $tmp .= $line;
            print "$tmp\n";
            # Reset values.
            $tmp = "";
            $num_pipes = 0;
    }
    elsif ($num_pipes == 34 && length($tmp) == 0)
    {
            print "$line\n";
            $num_pipes = 0;
    }
    elsif ($num_pipes < 34)
    {
            $tmp .= $line;
    }
    elsif ($num_pipes > 34)
    {
            print STDERR "Error before line $line_num. Too many pipes ($num_pipes)\n";
            $num_pipes = 0;
            $tmp = "";
    }
}